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This  paper  Is  primarily  a  working  paper.  It  Is  published  solely  to  document  work  performed. 


SUMMRY 


The  purpose  of  this  paper  was  to  review  the  foundations  and  current  developments  of 
psychological  testing  In  the  Military  and  In  other  settings.  In  the  first  part  of  the  paper,  the 
Issue  of  the  value  of  psychological  tests  Is  addressed  by  reviewing  a  number  of  traditional 
validation  studies.  It  Is  concluded  that  although  tests  nay  be  quite  useful  In  predicting 
standard  outcome  criteria,  there  Is  a  need  for  developing  new  tests  rooted  In  cognitive  theory, 
and  for  developing  richer  validation  data.  In  the  second  part  of  the  paper,  a  number  of  recent 
studies  that  have  employed  testing  Methods  based  on  cognitive  psychology  are  reviewed.  These 
studies  suggest  a  number  of  areas  In  which  cognitive  psychology  may  contribute  to  the  development 
of  a  new  approach  to  ability  testing.  Such  an  approach  should  lead  to  a  broader,  more 
comprehensive  system  of  ability  assessment  for  personnel  selection  and  classification  purposes. 
A  cognitive  approach  to  ability  assessment  also  suggests  many  possibilities  for  diagnosing 
particular  learner  deficiencies. 
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PREFACE 


Production  of  this  paper  was  supported  by  the  Air  Force  Learning  Abilities 
Measurement  Program  (LAMP),  a  multi-year  program  of  basic  research  conducted  at  the  Air 
Force  Human  Resources  Laboratory  and  sponsored  by  the  Air  Force  Office  of  Scientific 
Research.  The  goals  of  the  research  program  are  to  specify  the  basic  parameters  of 
learning  ability,  to  develop  techniques  for  the  assessment  of  Individuals'  knowledge  and 
skill  levels,  and  to  explore  the  feasibility  of  a  model-based  system  of  psychological 
assessment.  Other  members  of  the  LAMP  group,  Raymond  Chrlstal,  William  Tlrre,  and  Dan 
Woltz,  provided  valuable  and  Insightful  comments  on  Issues  discussed  In  this  manuscript. 
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THEORY-BASED  COGNITIVE  ASSESSMENT 


I.  INTRODUCTION 

In  many  large  organizations,  psychological  tests  are  routinely  administered  to  job  applicants 
to  obtain  Information  about  their  likelihood  of  succeeding  on  the  job  or  In  prerequisite 
training.  For  historical  and  economic  reasons,  such  tests  typically  consist  of  a  variety  of 
multiple-choice  Items  administered  In  the  paper-and-pencll  format.  Items  are  designed  to  probe 
the  applicant's  ability  to  reason  logically,  to  think  quantitatively,  to  comprehend  verbal 
material  and  so  forth.  Traditionally,  these  abilities  and  other  skills  have  been  called 
aptitudes,  reflecting  the  philosophical  stance  that  such  skills  are  properly  characterized  as 
personal  traits,  which  are  relatively  Impervious  to  training. 

In  recent  years,  many  psychologists  concerned  with  individual  differences  have  begun 
questioning  the  foundations  on  which  this  kind  of  testing  technology  has  developed.  In 

particular,  the  model  of  Intellectual  functioning  that  characterizes  Individuals  In  terms  of  a 
small,  finite  set  of  relatively  stable  traits  Is  fast  giving  way  to  a  whole  new  conception  of 
Intellectual  functioning  based  on  the  view  of  the  person  as  Information-processor.  This 
reorientation  has  given  rise  to  a  "new  look"  In  ability  measurement  that  promises  to  overhaul  the 
conventional  way  In  which  psychological  tests  are  administered,  along  with  the  ways  in  which 
performance  on  such  tests  Is  Interpreted.  The  new  look  substantially  builds  on  traditional 
factor-analytic-based  accounts  of  Individual  differences  and  borrows  heavily  from  the  discipline 
of  experimental  cognitive  psychology.  As  such,  the  new  research  approach  to  Individual 

differences  represents  the  beginnings  of  a  convergence  between  what  Cronbach  (1957)  called  the 
"two  disciplines  of  scientific  psychology":  the  experimental  and  the  correlational.  It  seems 
quite  likely  at  this  time  that  the  experimental -cognitive  approach  to  Individual  differences  will 
lead  to  the  generation  of  a  whole  new  system  of  psychometrics,  based  on  experimentally  studied 
process  models  Instead  of  the  traditional  trait  models.  This  paper  presents  a  detailed  account 
of  what  the  new  information  processing  approach  Is  likely  to  bring  In  the  way  of  new 
psychological  tests  and  new  ways  of  using  test  score  data. 

Before  the  new  approaches  to  cognitive  assessment  are  discussed.  It  Is  useful  to  review  the 
role  and  status  of  a  psychological  testing  program  In  an  organization  In  general.  Human  ability 
assessment  can  play  a  broad  and  critical  role  In  enhancing  human  productivity  and  thereby 
Increasing  organizational  effectiveness.  Consider  four  obvious  ways  In  which  any  large 
organization  can  enhance  the  productivity  of  Its  workforce.  The  first,  and  often  most  expensive, 

Is  training.  It  has  been  estimated  that  organizations  spend  up  to  10%  of  their  total  payroll  on 

training  (Gilbert,  1976),  which  can  amount  to  a  substantial  sum.  Training  and  assessment  are 
Integrally  Intertwined.  An  organization  trains  those  most  likely  to  benefit  from  training,  and 
day-to-day  decisions  about  what  to  train  next  depend  on  an  assessment  of  what  the  student  or 
trainee  knows  right  now.  The  second,  and  In  many  ways  the  most  elusive  means  of  productivity 
enhancement.  Is  to  Increase  employees'  motivation  levels.  The  capability  for  assessing  an 
Individual's  current  motivation  level  Is  an  obvious  prerequisite  to  evaluating  any  policy 
designed  to  enhance  motivation.  The  third  method  for  productivity  enhancament  is  to  design  the 
systems  with  which  operators  Interact  In  such  a  way  as  to  optimize  the  efficiency  of  the 
man-machine  Interaction.  Witness  the  public  attention  given  to  conditions  faced  by  air-traffic 
control  operators,  and  note  that  many  follow-up  studies  have  been  concerned  with  the  redesign  of 
systems  In  order  to  accommodate  human  factors  (e.g.,  Hopkln,  198?).  Improved  human  factoring  of 
the  workplace  depends  to  a  large  degree  on  adequate  assessment  methods.  Finally,  organizational 
productivity  can  be  enhanced  with  an  appropriate  selection  and  classification  system. 

Many  managers  do  not  fully  appreciate  the  Importance  of  Initial  personnel  selection  and 
classification  decisions.  Numerous  research  studies  attest  to  the  wide  variation  In  the  learning 


and  perfomance  capabilities  of  Individuals  (Rlmland  4  Larson,  In  press).  Data  collected  at  the 
Air  Force  Hunan  Resources  Laboratory  (AFHRL)  and  elsewhere  have  shown  that  some  Individuals  can 
acquire  skills  and  knowledge  10  to  20  times  faster  than  others  {Payne  4  Tlrre,  1984).  Managers 
often  Incorrectly  conclude  that  on-the-job  performance  deficiencies  are  due  to  motivation  and 
training  deficiencies  when  an  equally  plausible  case  could  be  made  for  identifying  the  source  of 
the  problem  as  selection  and  classification  errors.  It  Is  highly  unlikely  that  training  can 
always  overcome  a  serious  talent  deficiency;  In  fact,  In  high  demand  areas.  Individual 
differences  more  often  than  not  are  magnified  by  training  and  experience  (Cronbach  4  Snow,  1977). 

Unfortunately,  present  personnel  measurement  tests  are  not  highly  accurate  in  identifying 
before  the  fact  who  will  be  the  fast  and  slow  learners.  Present  tests  do  not  measure  many  of  the 
abilities  required  for  acquiring  skills  demanded  by  diverse  occupations.  The  importance  of  more 
accurate  measurement  of  learning  abilities  Is  elevated  by  forecasts  of  manning  problems  In  the 
next  decade.  The  number  of  18-  to  21 -year-olds  In  the  national  manpower  pool  will  decrease  by 
about  20*  In  the  near  future  and  will  remain  at  this  low  level  through  the  1990s.  At  the  same 
time,  competition  between  the  military  and  civilian  sectors  for  these  scarce  manpower  resources 
Is  expected  to  increase  as  a  function  of  an  Improving  and  expanding  economy.  Clearly  the 
Importance  of  selecting  the  best  people  as  a  means  of  productivity  enhancement  will  become 
Increasingly  Important. 

The  purpose  of  this  paper  Is  to  discuss  the  role  that  cognitive  assessment  techniques  can 
play  In  enhancing  human  productivity.  Although  the  most  obvious  role  for  assessment  methods  is 
In  the  area  of  personnel  selection  and  classification.  It  can  be  argued  that  Improved  techniques 
can  serve  all  four  areas:  training,  motivation,  human  engineering,  and  selection  and 
classification.  The  following  section  begins  with  a  brief  review  of  the  history  of  aptitude 
testing,  with  particular  emphasis  on  the  evolution  of  the  testing  program  within  the  military 
services.  The  section  Is  concluded  by  pointing  out  that  In  recent  years  conventional  notions  of 
aptitude  have  come  under  attack  from  both  within  and  outside  the  field,  but  that  recent 
theoretical  developments  In  cognitive  psychology,  coupled  with  the  now  almost  ubiquitous 
microcomputer,  promise  to  change  the  nature  of  ability  testing  and  provide  It  with  a  firmer 
theoretical  foundation. 

It  Is  Important,  In  discussing  a  new  theory-based  approach  to  assessment,  to  provide  at  least 
the  gllimnerlngs  of  the  theory  that  serves  as  the  base.  Thus,  In  the  third  section,  a  description 
of  the  human  as  an  Information  processing  system  Is  outlined  and  how  such  a  view  might  serve 
useful  as  a  foundation  for  new  cognitive-based  testing  research  Is  discussed.  This  is  followed 
with  a  description  of  new  assessment  techniques  rooted  In  cognitive  theory,  and  some  of  the 
studies  that  have  attempted  to  determine  the  utility  of  these  new  measurement  methods  and 
approaches  are  reviewed.  The  studies  can  be  divided  naturally  into  clusters  based  on  the  focus 
of  the  investigation.  One  set  of  studies  Is  concerned  with  the  question  of  whether  elementary 
cognitive  tasks  can  supplement  or  even  replace  conventional  tests.  A  second  set  of  studies  Is 
concerned  with  how  the  ability  to  learn  can  be  measured.  A  third  set  of  studies  addresses  the 
Issue  of  whetht,  complex  cognitive  skills  such  as  reading  can  be  broken  down  Into  more  elementary 
skills.  In  these  study  review  sections,  particular  attention  Is  given  to  studies  that  have  been 
conducted  as  part  of  the  AFHRL  Learning  Abilities  Measurement  Program  (Project  LAMP).  Throughout 
this  paper,  the  numerous  possible  applications  of  new  methods  for  cognitive  assessment  are 
discussed,  beyond  the  obvious  ones  In  selection  and  classification  contexts.  These  include 
applications  In  remedial  diagnosis,  the  development  of  training  systems,  and  the  design  of 
systems  to  accommodate  human  factors.  Finally,  In  a  summary  section,  these  developments  are 
reviewed  and  the  cost  effectiveness  of  some  of  the  new  forms  of  cognitive  assessment  Is  discussed. 
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II.  HISTORY  OF  COGNITIVE  ASSESSMENT 


Up  until  the  mid-1970s,  the  predominant  form  of  theorizing  about  cognitive  abilities  employed 
factor-analytic  concepts  and  methods.  Perhaps  the  first  theory  of  cognitive  ability  was  the 
two-factor  theory  proposed  by  Spearman  (1905),  who  noted  that  correlation  matrices  of  cognitive 
test  scores  exhibited  what  he  termed  "positive  manifold,"  a  condition  characterized  by  the 

absence  of  zero  or  negative  correlations.  That  is.  If  a  person  outperforms  others  on  one 
cognitive  test,  that  person  Is  likely  to  do  better  than  others  on  any  other  cognitive  test. 

Spearman  attempted  to  explain  this  phenomenon  by  proposing  the  concept  of  “general  ability"  which 
Itself  was  defined  as  the  level  of  "mental  energy"  available  to  the  person.  Spearman  assumed 
that  differences  In  mental  energy  level  were  responsible  both  for  differences  in  success  In 
schooling  and  success  on  the  cognitive  tests.  Later,  Thurstone  (1938)  modified  the  Spearman 
proposal  by  postulating  seven  relatively  Independent  primary  mental  abilities  such  as  verbal, 
spatial,  deductive  reasoning,  and  memory  abilities.  Thurstone  contended  that  to  predict  a 
person's  score  on  any  cognitive  test.  It  was  not  sufficient  to  know  that  person's  general  ability 
level  because  different  kinds  of  abilities  played  different  roles  on  various  tests.  More 

formally,  Thurstone  proposed  that  relative  standing  on  one  test  could  be  predicted  from  relative 
standing  on  other  tests  by  the  conmon  factor  equation: 

"  -*0~LL  +  +  +  ■“  +  ^kfkj.  +  + 

In  this  equation,  j  Is  relative  standing  of  the  Jth  person  on  the  jth  test;  the  £'s  represent 
the  relative  level  "of  ability  of  the  person  on  the  1..k  abilities,  and  the  w^’s  represent  the 
relative  Importance  of  each  ability  In  predicting  relative  standing  on  the  jth  "test.  In  effect, 
Thurstone  was  advancing  a  kind  of  “mental  chemistry"  theory  of  learning  and  cognition,  In  which 
any  learning  or  performance  activity  could  be  characterized  by  an  ability  requirements  (or 
Importance)  profile,  and  any  person  could  be  characterized  by  an  ability  level  profile.  The 
Importance  of  Thurstone's  system  for  classifying  people  and  tasks  was  realized  by  military 

psychologists,  and  It  might  not  be  too  unfair  to  assert  that  the  form  of  even  present-day 
selection  and  classification  systems  In  Industry  and  education,  as  well  as  In  the  military 
services.  Is  a  fairly  direct  result  of  Spearman's  and  Thurstone's  contributions. 

Consider,  for  example,  Guilford  and  Lacey's  (1946)  monumental  work.  Printed  Classification 
Tests,  which  Is  widely  credited  with  establishing  the  groundwork  for  virtually  all  subsequent 
military  selection  and  classification  testing  (Weeks,  Mullins,  S  Vltola,  1975).  In  that  report, 
the  authors  divide  the  presentation  Into  a  more-or-less  subjective  task  analysis  of  aircrew 
operators'  jobs  followed  by  an  evaluation  of  tests  of  a  number  of  general  and  specific 
abilities.  According  to  Guilford,  In  his  Preface,  one  of  the  key  features  of  the  work  was 
"...the  Inclusion  of  analysis  of  job  criteria  by  the  factorial  methods.  It  is  believed  that  in 
this  direction  lies  an  economical,  systematic,  and  dependable  procedure  for  coverage  of  aptitudes 
and  for  fitting  tests  to  vocations"  (p.  111).  That  Is,  the  utility  of  personnel  classification 
tests  was  made  obvious  by  the  Inclusion  In  the  basic  factor  equation  of  both  abilities  tapped  by 
tests  and  abilities  demonstrated  in  training  or  In  jobs. 

In  one  of  the  earliest  validation  studies  following  Guilford  and  Lacey's  report,  Dailey 
(1948)  emphasized  the  Thurstonian  underpinnings  of  military  testing  by  declaring  that  "...a 
fundamental  postulate  has  been  that  each  airman  specialty  requires  a  different  combination  of 
specific  aptitudes  for  success.  A  further  postulate  Is  that  basic  airmen  entering  the  Air  Force 
have  greatly  different  patterns  of  the  specific  aptitudes  essential  to  success  In  various 
specialties,"  and  further  stating  that  In  developing  tests  "...heavy  emphasis  is  placed  upon  the 
techniques  of  factor  analysis"  (p.  1).  Since  that  time,  a  similar  theoretical  rationale  is 
routinely  stated  In  the  Introductory  remarks  of  validation  studies  (e.g.,  Gragg  S  Gordon,  1950, 
p.  3).  As  evidence  that  even  today,  Spearman's  and  Thurstone's  influence  is  felt  in 
classification  battery  development  efforts,  consider  the  following  quote  from  Weeks  et  al .  (1975). 


The  fundamental  postulate,  which  has  served  as  the  basis  for  the  development  of  the 
classification  batteries.  Is  that  each  Air  Force  job  specialty  requires  a  specific 
pattern  of  aptitudes  for  success.  If  the  major  aptitudes  comnon  to  the  various 
specialties  can  be  separately  measured,  It  would  be  possible  to  predict  each  applicant's 
probable  success  In  any  job  specialty  by  means  of  an  empirically  weighted  composite 
score  based  on  those  tests  measuring  aptitudes  necessary  for  that  specialty  (p.  7). 

It  Is  useful  at  this  point  to  review  the  degree  of  success  to  which  such  a  testing  philosophy 
has  led.  Despite  widespread  public  attention  and  some  confusion  over  the  matter  (e.g.,  Gould, 
1982;  Nairn,  1980),  It  Is  the  case  that  relatively  short  (less  than  3  hours  total  acini nlstratlon 
time)  batteries  of  psychological  tests  are  remarkably  accurate  at  predicting  future  learning  and 
performance  criteria.  As  many  have  pointed  out,  one  of  the  problems  In  some  of  the  public 
criticisms  Is  the  failure  to  take  Into  account  the  range  restriction  phenomenon.  An  Inspection 
of  the  validity  coefficients  of  the  College  Entrance  Examination  Boards's  Graduate  Record  Exam, 
for  example,  gives  a  depressing  picture  of  the  utility  of  psychological  tests.  Wilson  (1982,  p. 
16)  found  a  validity  coefficient  of  0.27  using  the  GRE-V  to  predict  first-year  graduate  grade 
point  average  (GPA)  In  verbal  fields  (N  =  620)  and  a  coefficient  of  0.28  using  the  GRE-Q  to 
predict  GPA  In  quantitative  fields  (N  ■  529).  Neither  coefficient  seems  large  enough  at  first 
glance  to  place  a  great  deal  of  confidence  In  the  tests'  abilities  to  select  candidates  for 
graduate  study.  However,  It  Is  highly  likely  that  the  reason  for  the  apparent  modesty  in  the 
magnitude  of  the  coefficients  Is  to  be  found  In  the  severely  restricted  range  of  ability 
characteristic  of  samples  of  students  engaged  In  graduate  study.  Samples  of  military  enlistees, 
on  the  other  hand,  offer  a  much  greater,  though  still  not  completely  representative,  degree  of 
heterogeneity,  and  countless  validity  studies  conducted  in  military  settings  over  the  past  40 
years  attest  to  the  utility  of  psychological  tests.  Table  1  summarizes  results  of  a  number  of 
validity  studies  conducted  using  evolving  versions  of  military  selection  and  classification  tests 
over  the  last  40  years.  The  validities  shown  in  the  table  are  correlations  between  weighted 
composites  of  test  scores  from  the  particular  test  battery  with  final  technical  school  GPA. 
Across  a  wide  variety  of  courses  and  a  wide  variety  of  batteries,  validity  coefficients  are 
consistently  high. 

Table  1 .  Validity  Coefficients  of  Air  Force  Test  Batteries 


Test 

Battery 

Year 

Number  of 
Courses 

Range  of 
Validities 

Median 

Validity 

Median 
Sample  Size 

AC1-A 

1951 

29 

.32  -  .77 

.61 

264 

AC1-B 

1956 

21 

.34  —  .77 

.60 

402 

AC2-A 

1959 

46 

.11  —  .80 

.57 

124 

AQE-D 

1958 

3 

.46  —  . 50a 

.47 

182 

AQE-F 

1963 

41 

.29  —  .90 

.63 

433 

AQE-62 

1962 

4 

.75  -  .81a 

.79 

1493 

AQE-64 

1968 

57 

.38  —  .87 

.64 

410 

AQE-66 

1973 

46 

.18  -  .90 

.68 

115 

AQE-d 

1971 

4 

.69  —  .84a 

.82 

3396 

ASVAB-3 

1968 

46 

.29  -  .87a 

.68 

_ c 

AQE/AFQT  (1) 

1974 

42 

.16  —  .63b 

.42 

1000 

AQE/AFQT  (2) 

1974 

43 

.16  -  .65b 

.44 

823 

AQE/AFQT  (3) 

1974 

57 

.22  —  .68b 

.53 

890 

Notes.  The  first  ten  rows  are  adapted  from  Weeks,  Mullins,  and  Vltola 


(1975).  The  last  three  rows  are  adapted  from  Christal  (1976). 

inferred  validities  from  test  relationships  with  previous  batteries  for 
which  actual  validity  studies  were  conducted. 
bNot  corrected  for  restriction  of  range. 
cUnknown. 


One  of  the  problems  with  the  data  In  Table  1,  as  well  as  data  from  most  validity  studies.  Is 
In  the  criterion  of  final  technical  school  GPA.  Weeks  et  al.  (1975)  realized  the  lack  of  an 
empirical  job  performance  criterion  as  a  critical  limitation  In  these  studies,  but  the  problem 
until  recently  has  been  In  the  lack  of  any  large  scale  efforts  to  develop  satisfactory  criteria. 
It  seems  likely  now  with  pressure  being  applied  both  by  the  United  States  Congress  and  the 
military  services  that  valid  job  performance  measures  are  on  the  horizon  (Eaton  A  Shields,  1985; 
Gould  8  Hedges,  1983).  Chrlstal  (1976)  viewed  the  problem  not  only  In  terns  of  the  lack  of  good 
performance  measures,  but  also  In  the  Inherent  difficulty  of  ‘‘selling"  the  utility  of  aptitude 
tests  by  using  only  the  grade  point  average  criterion.  He  suggested  considering  other  criterion 
variables,  such  as  time  to  acquire  skills,  rate  of  skills  decay,  and  time  for  skills 
reacquisition.  He  also  reported  data  from  a  number  of  studies  that  showed  that  such  criteria  are 
as  predictable  by  aptitude  test  scores  as  Is  the  usual  GPA.  Nevertheless,  there  Is  an  obvious 
need  for  validation  studies  to  Include  criteria  that  are  both  more  valid  In  their  relationship  to 
the  target  performance  and  more  easily  translated  to  dollars  and  cents  utility. 

If  Indeed  It  can  be  recognized  that  theoretical  developments  paved  the  way  for  subsequent 
selection  and  classification  applications.  It  Is  useful  to  consider  the  directions  that  ability 
theory  has  moved  since  the  days  of  Thurstone,  In  order  to  forecast  future  possible  changes  In 
actual  selection  and  classification  systems.  One  avenue  of  theoretical  advance  has  been  In  what 
might  be  called  factor  theory.  Much  of  this  work  may  for  the  most  part  be  viewed  either  as 
extensions  or  syntheses  of  the  Spearman  and  Thurstone  proposals.  Thus,  for  example,  Vernon 
(1961)  and  Burt  (1940)  have  combined  the  two-factor  and  primary  abilities  models  by  proposing 
hierarchical  models  of  ability  organization,  with  general  ability  at  the  top  and  more  specific 
abilities  arranged  In  orderly  fashion  beneath.  A  currently  popular  variation  on  this  scheme  Is 
that  there  are  two  general  abilities:  general  fluid-analytic  (Gf)  and  general  crystallized  (Gc) 
(Cattell,  1971;  Horn,  1968).  Gf  Is  believed  to  be  close.  If  not  Identical,  to  Spearman's  g 
(Gustaffson,  1984)  and  Is  said  to  drive  the  development  of  Gc,  which  represents  the  product  of 
accumulated  learning  experiences.  Some  developmental  data  would  appear  to  support  the 
distinction:  Gf  level  peaks  during  early  adulthood,  while  Gc  level  rises  continuously  with  age 
(Cattell,  1971;  Snow  A  Lohman,  1981). 

Work  on  extending  Thurstone's  proposal  Is  exemplified  by  Guilford’s  structure-of-the- 
Intellect  model,  which  proposes  120  abilities  but,  more  Importantly,  specifies  the  dimensions  of 
product,  operation,  and  content  along  which  mental  tests  can  be  classified.  However,  even 
Guilford  (1982)  adalts  that  these  specific  abilities  may  be  correlated,  and  as  Gustaffson  (1984) 
points  out,  this  opens  the  door  to  a  unifying  hierarchical  theory  of  ability  organization.  In 
fact,  Gustaffson  has  proposed  such  a  theory,  the  HILI  model  (Hierarchical,  LISREL -based  model), 
which  takes  advantage  of  recent  developments  In  linear  structural  equation  modeling  techniques. 
Gustaffson  proposes  a  unifying  synthesis  of  the  Thurstonlan  primary-factor  model  with  the 
Cattell-Horn  fluid-crystallized  model.  It  Is  likely  that  such  a  unifying  hierarchical  model 
brings  with  It  advantages  for  practical  application.  For  the  most  general  decisions  about  an 
Individual's  cognitive  status,  a  test  or  battery  of  tests  desfgned  to  tap  the  highest  order 
factor  might  be  administered.  With  more  subtle  requirements  for  classification  decisions, 
coupled  with  the  luxury  of  more  available  testing  time,  the  samples  might  be  obtained  from  the 
lower  strata  of  ability  levels. 

Despite  these  potentially  Important  recent  developments  in  factor  theories  and  methods,  much 
of  the  theoretical  Interest  In  Individual  differences  In  learning  and  intelligence  for  the  most 
part  waned  during  the  1960s.  This  lack  of  Interest  was  due  primarily  to  disaffection  for  the 
method  of  factor  analysis  and  Its  associated  theories  of  learning  and  Intelligence.  Applied 
psychologists  noted  a  stagnation  In  the  field  from  a  utility  standpoint.  Despite  Increases  In 
the  mathematical  sophistication  of  factoring  methods,  a  concomitant  Increase  in  occupational  and 
academic  validities  was  not  demonstrated  (Chrlstal,  1981),  which  is  apparent  from  the  absence  of 
any  upward  trend  In  the  magnitude  of  coefficients  In  Table  1  over  the  years.  At  the  same  time, 


others  expressed  a  renewed  dissatisfaction  due  to  not  understanding  what  It  was  that  intelligence 
tests  measured  (Hunt,  Frost,  4  Lunneborg,  1973;  Sternberg,  1977).  An  Important  change  In 
perspective  occurred  with  the  development  of  a  psychology  of  cognition,  starting  In  the  late 
1960s  (see  especially  Nelsser,  1967)  and  flourishing  In  the  1970s.  The  new  cognitive  psychology 
provided  a  set  of  methodological  tools  for  exploring  mental  processes  within  an  experimental 
approach,  and  beginning  In  the  mid-1970s,  the  study  of  individual  differences  In  mental  processes 
was  once  again  an  active  area  of  Investigation. 

With  the  shift  from  the  traditional  differential  and  correlational  methods  of  Investigating 
cognition  to  experimental  methods  came  a  shift  In  emphasis.  While  It  Is  always  true  that 
Individual  differences  research  Is  concerned  with  the  ways  In  which  people  differ  and  the 

underlying  sources  that  lead  to  those  differences,  the  newer  approaches  are  at  least  equally 

concerned  with  specifying  the  necessary  prerequisite  knowledge  and  cognitive  skills  that  allow 
any  Intelligent  act.  Including  learning,  to  occur.  Thus,  compared  to  the  factor-analytic 
approach,  the  experimental  approach  Is  characterized  to  a  considerably  greater  degree  by  the 
method  of  testing  competing  models  of  intelligent  performance  and  only  then  examining  Individual 
differences  In  the  parameters  of  the  appropriate  model.  This  general  approach  actually  has  taken 
two  forms,  labeled  by  Pellegrino  and  Glaser  (1979)  as  the  cognitive  correlates  and  cognitive 

components  approaches.  The  goal  of  Investigations  conducted  within  the  cognitive  correlates 

framework  1$  to  determine  the  cognitive  skills  and  knowledge  structures  underlying  observed 
differences  between  high  and  low  skilled  Individuals  in  broad  ability  (e.g.,  verbal,  spatial, 
numerical)  or  skill  domains  (e.g.,  physics,  chess,  electronics  troubleshooting,  geometry, 
computer  programming).  In  this  research,  high  and  low  ability  Individuals  (or  experts  and 
novices)  are  Identified,  then  administered  a  series  of  cognitive  tasks,  each  of  which  Is  presumed 
to  be  well  understood  In  Its  cognitive  requirements.  Tasks  are  typically  selected  so  as  to  tap 
specific  cognitive  mechanisms  which  enables  the  researcher  to  test  competing  hypotheses  about  the 
sources  of  between-sklll -group  differences. 

The  goal  of  research  conducted  within  the  cognitive  components  framework  Is  similar  to 
cognitive  correlates  research  In  that  It  too  seeks  to  explain  the  cognitive  mechanisms  underlying 
broad  ability  differences.  The  difference  is  that  the  components  approach  Is  to  Investigate 
directly  the  usually  more  complex  tasks  on  which  the  ability  differences  are  observed,  an 
approach  originally  suggested  by  Estes  (1974).  In  typical  cognitive  components  Investigations, 
Individuals  attempt  tasks  that  bear  a  strong  resemblance  to  Intelligence  test  Items.  By 
systematically  varying  features  of  the  task,  the  researcher  allows  various  mathematical  models  to 
be  fit  to  the  latency  or  error  data  for  Individual  subject.  Each  model  Is  normally  an  embodiment 
of  a  particular  theory  of  the  tasks  performed.  Parameters  of  the  models  usually  represent 
various  psychological  processes,  and  thus  an  Individual's  process  execution  times  or 
probabilities  are  estimated  directly  In  the  parameters. 

In  the  past  10  years,  a  considerable  amount  of  Individual  differences  work  has  used  these  two 
approaches  or  variations,  and  this  has  led  to  the  crystallization  of  different  views  on  the 
nature  of  Individual  differences  In  Intelligence  and  learning  ability.  The  remainder  of  this 
paper  Is  devoted  to  consideration  of  this  recent  work. 


III.  THEORETICAL  FOUNDATIONS 

Typically,  two  criteria  are  used  when  determining  which  or  what  kinds  of  tests  to  Include  In 
a  battery  for  selection  and  classification  decisions.  The  first,  which  might  be  designated  the 
Job  sample  criterion,  rests  on  the  evaluation  of  the  content  validity  of  the  candidate  test.  If 
one  must  select  a  secretary,  for  example,  a  battery  that  Includes  tests  of  typing  skill  and  that 
samples  other  typical  secretarial  duties  would  satisfy  this  criterion.  A  second  criterion,  which 
might  be  labeled  simply  the  empirical  criterion.  Is  satisfied  to  the  degree  that  performance  on 


candidate  tests  correlate  positively  with  performance  on  the  target  job.  There  are  problems  with 
both  of  these  criteria.  A  logistic  problem  with  the  job  sample  criterion  Is  that  there  may  be 
too  many  jobs  from  which  to  extract  work  samples.  If  the  test  battery  Is  to  be  used  for 
classification  decisions,  and  there  are  many  possible  jobs  Into  which  an  applicant  might  be 
classified  (as  Is  the  case  In  large  organizations  such  as  the  Air  Force),  then  the  amount  of 
testing  time  required  to  sample  all  possible  jobs  Is  prohibitive.  However,  an  even  more 
devastating  problem  with  the  job  sample  criterion  Is  that  job  requirements  are  constantly 
changing.  For  example,  It  Is  likely  that  the  skills  Involved  In  successful  manuscript  production 
on  a  mechanical  typewriter  are  different  from  those  employed  In  using  a  powerful  word  processing 
system. 

There  are  also  problems  with  the  empirical  criterion.  First,  the  criterion  can  only  be 
applied  after  a  decision  about  what  test  to  try  has  been  made.  It  can  be  determined  whether  the 
currently  existing  battery  of  secretarial  tests  does  an  adequate  job,  but  some  other  means  for 
selecting  those  tests  In  the  first  place  Is  required.  Second,  the  empirical  criterion  does  not 
provide  an  absolute  standard  against  which  to  measure  a  test's  success;  If  a  validity  coefficient 
of  0.30  Is  found,  does  that  mean  a  test  Is  a  good  one  or  not?  The  empirical  criterion,  applied 
after  the  fact,  never  provides  Information  about  whether  some  other  test  might  prove  to  be  more 
valid  than  the  existing  one.  Finally,  a  third  problem  Is  that  validity  studies  themselves  can  be 
quite  expensive. 

One  of  the  premises  this  paper  Is  based  on  Is  that  these  problems  may  be  alleviated  through 
the  application  of  a  technology  of  psychological  testing  derived  from  cognitive  theory.  In  what 
might  generlcally  be  called  the  cognitive  skill  assessment  approach.  Such  an  approach,  In 
principle,  would  require  (a)  the  determination  of  what  cognitive  skills  are  required  In  training 
and  In  the  work  place,  (b)  the  determination  of  what  cognitive  skills  are  Involved  In  taking 
psychological  tests,  and  (c)  the  matching  of  training/job  skills  with  cognitive  task  skills  and 
thereby  logically  deriving  training/job  skills  requirements.  This  would  amount  to  a  kind  of 
decomposition  analysis  In  which  aptitudes  would  be  redefined  as  sets  of  cognitive  skills  and  jobs 
would  be  defined  as  sets  of  cognitive  requirements.  Such  an  approach  would  provide  a  different 
perspective  on  the  person-job-match  system  (Holtz  4  Schmitz,  1985)  and  would  serve  as  a  flexible, 
adaptive  system  for  specifying  job  skills  and  person  skills  for  all  kinds  of  training 
situations— computer-assisted  Instruction,  on-the-job  training,  and  even  lockstep  classroom 
Instruction.  Also,  such  a  system  offers  promise  as  a  test  construction  tool.  One  can  Imagine 
specifying  In  advance  what  cognitive  skills  will  be  measured  when  various  facets  of  a  complex 
cognitive  task  are  systematically  manipulated. 

A  system  such  as  the  one  outlined,  however,  requires  the  foundation  of  a  solid  theory  of 
Individual  differences  In  cognition.  Unfortunately,  such  a  theory  does  not  yet  exist,  and  much 
of  the  remainder  of  this  paper  will  be  devoted  to  an  assessment  of  how  far  along  Is  the 
development  of  such  a  theory.  The  first  consideration  Is  what  such  a  theory  should  do: 

1.  A  theory  of  Individual  differences  In  cognition  should  specify  the  cognitive  processes 
and  knowledge  structures  that  underlie  Individual  differences  In  the  ability  to  acquire  and  apply 
knowledge  and  skills  In  a  broad  variety  of  contexts.  Call  the  underlying  attributes  the  sources 
of  Individual  differences. 


2.  The  theory  should  specify  how  these  sources  can  be  assessed  at  the  level  of  the 
Individual . 


3.  The  assessment  techniques  should  yield  quantifiable  Indicators  that  serve  both  to 
provide  an  account  of  how  well  the  theory  fits  the  data  (l.e.,  how  well  the  source  measurements 
predict  performance  In  learning  and  Intelligent-performance  contexts)  and  be  used  In  principle  as 
ability  measurements  In  an  operational  context. 


In  the  lest  few  years  a  number  of  theoreticians  have  applied  some  of  the  Ideas  emanating  from 
cognitive  psychology  In  speculating  on  the  form  a  theory  of  Individual  differences  In  cognition 
might  take.  Snow  (1978;  1980)  has  proposed  that  Individuals  might  differ  either  In  the 
efficiency  with  which  they  are  able  to  execute  elementary  Information  processes,  or  In  their 
approach  to  or  their  general  strategy  for  attacking  problems.  Hunt  (1978)  has  suggested  that 
differences  In  cognitive  abilities  might  be  reduced  to  differences  In  knowledge,  strategies,  or 
mechanistic  processes.  And  Sternberg  (1977;  1980)  has  proposed  an  elaborate  component  hierarchy 
In  which  Individuals  differ  In  meta -components,  performance  components,  acquisition  components, 
retention  components,  and  transfer  components.  All  these  proposals  must  be  viewed  as  somewhat 
speculative  at  the  present,  but  nevertheless  they  may  be  useful  In  proposing  research  directions. 

In  the  LAMP  project,  a  slightly  different  framework  has  been  adopted,  not  only  as  a  heuristic 
for  guiding  research  but  also  as  a  way  of  organizing  and  classifying  the  existing  and  now 
burgeoning  literature  on  Individual  differences  In  cognitive  abilities.  The  framework  Is  derived 
from  a  critical  review  of  the  existing  cognitive-differential  literature  (kyllonen,  1985a).  From 
the  review,  three  general  conclusions  can  be  drawn.  First,  whatever  It  Is  that  underlies 
Intelligent  performance  also  underlies  the  ability  to  learn.  This  Is  consistent  with  both  the 
empirical  evidence  and  theoretical  considerations  derived  from  an  analysis  of  current  cognitive 
theory.  Second,  four  sources  can  be  tentatively  Identified  as  underlying  the  ability  to  learn 
and  to  perform  Intelligently.  These  are  (a)  working  memory  capacity,  (b)  Information  processing 
speed,  (c)  the  declarative  or  factual  knowledge  base,  and  (d)  the  procedural  or  strategic 
knowledge  base.  Currently,  these  sources  are  merely  taxonomic  categories  for  variables  that  In 
principle  could  be  measured  on  Individual  subjects.  Nevertheless,  such  a  taxonomic  delineation 
Is  a  useful  first  step.  The  third  conclusion  to  be  drawn  Is  that  the  sources  do  not  contribute 
addltlvely  to  proficiency  In  learning  and  Intelligent  behavior— they  Interact.  In  particular, 
the  extent  of  an  Individual's  declarative  and  procedural  knowledge  base  In  a  particular  domain 
affects-  both  the  Individual's  effective  working  memory  capacity  and  his  or  her  speed  of 
processing  Information  related  to  that  domain.  These  relationships  are  depicted  graphically  in 
Figure  1  In  what  Is  termed  here  the  Interactive  common  sources  framework. 


Figure  1.  Interactive  Common  Sources  Framework. 


According  to  the  framework,  the  success  that  an  Individual  experiences  In  classroom  learning 
activities,  on-the-job  training,  and  on-the-job  performance  Is  determined  by  the  Individual's 
level  of  cognitive  and  learning  proficiency.  Cognitive  proficiency  refers  here  to  an 
Individual's  ability  to  remember,  to  make  decisions  and  choices,  and  to  solve  problems.  Learning 
proficiency  refers  to  the  Individual's  ability  to  acquire  new  facts  and  cognitive  skills.  This 
distinction  Is  meant  to  align  roughly  with  the  classical  distinction  between  learning  and 
performance.  The  guidance  provided  by  this  framework  also  permits  the  tentative  assumption  that 
differences  between  people  In  learning  and  cognitive  proficiency  levels  result  from  differences 
In  the  more  fundamental  sources  of  processing  speed,  memory  capacity,  declarative  (factual),  and 
procedural  (rule-based)  knowledge.  The  common  sources  view  Is  that  these  components  are  what 
underlie  both  learning  and  cognitive  differences.  The  Interactive  view  Is  that  the  sources 
Interact  with  each  other  In  determining  proficiency  levels.  For  example,  extensive  factual 
knowledge  In  a  particular  domain  (e.g.,  chess)  can  enlarge  an  Individual's  effective  memory 
capacity  (e.g.,  to  memorize  a  complex  board  configuration)  and  effective  processing  speed  (e.g., 
to  select  the  best  next  move).  Techniques  for  measuring  these  variables  are  discussed  In  the 
following  section. 


IV.  ISSUES  AND  TECHNIQUES  FOR  COGNITIVE  ASSESSMENT 

The  framework  presented  In  the  previous  section  can  serve  as  a  useful  guide  for  research; 
nevertheless.  Important  details  have  been  left  unspecified.  Before  the  framework  can  evolve  Into 
a  model  or  theory  of  Individual  differences  In  cognition,  three  classes  of  Issues  must  be 
considered:  assessment  methodology,  analysis  of  complex  cognitive  skill,  and  analysis  of 
learning. 


Methods  for  Cognitive  Assessment 


It  Is  critical  to  address  the  Issue  of  how  the  underlying  sources  of  knowledge  and  the 
Information  processing  parameters  can  be  assessed.  More  precisely,  the  questions  to  be  asked  are 
as  follows:  How  can  processing  speed  be  measured?  How  can  an  Individual's  working  memory 
capacity  be  determined?  How  can  the  extent  and  quality  of  an  Individual's  knowledge  base  be 
assessed?  In  each  case,  the  obvious  questions  must  be  answered:  Can  the  target  source 
(processing  speed,  memory  capacity,  knowledge)  be  measured  reliably?  Is  It  already  measured  by 
conventional  tests,  or  are  new  techniques  required?  Can  the  source  be  considered  a 
uni  dimensional  construct,  or  Is  there  more  than  one  dimension  Involved?  And  finally,  does  source 
capability  change  with  practice,  and  If  so,  by  how  much? 


These  Issues  may  be  considered  In  the  context  of  a  number  of  cognitive  correlates  studies 
that  have  been  conducted  at  AFHRL  and  elsewhere  In  the  last  few  years.  Much  of  this  work  has 
been  driven  by  the  general  consideration  of  whether  elementary  cognitive  tasks  might  someday 
supplement  or  even  replace  conventional  tests  as  aptitude  and  performance  measures.  Cognitive 
psychologists  have  been  remarkably  successful  In  developing  mathematical  models  that  account  for 
patterns  of  error  and  latency  data  across  a  large  number  of  tasks  by  positing  various  mental 
processes  and  knowledge  structures.  It  has  occurred  to  a  number  of  Individual -differences 
researchers  that  to  the  degree  such  models  are  valid  representations  of  psychological  processing, 
the  parameters  of  such  models  can  serve  as  direct  Indicators  of  the  speed  or  accuracy  with  which 
an  Individual  can  execute  a  particular  psychological  process. 


In  one  of  the  first  large  scale  efforts  constructed  with  this  general  philosophy  In  mind, 
Rose  and  Fernandez  (1977)  described: 

...a  program  of  research  dealing  with  the  development  and  validation  of  a  comprehensive 
standardized  test  battery  that  can  be  used  as  an  assessment  device  for  the  evaluation  of 
performance  In  a  wide  variety  of  situations....  Equally  Important,  the  battery  Is  being 
designed  to  Include  tests  that  possess  construct  validity:  there  will  be  a  firm 
theoretical  and  empirical  base  for  Inferring  the  Information  processing  structures  and 
functions  that  the  tests  purport  to  measure.  It  Is  expected  that  such  a  battery  will 
permit  Improved  personnel  management  decisions  to  be  made  for  a  wider  variety  of 
Navy-relevant  jobs  than  Is  currently  possible  using  existing  techniques,  (from  the 
abstract). 

With  equal  enthusiasm,  Carroll  (1980),  after  an  extensive  review  of  the  then  existing 
literature,  proclaimed  that  the  new  approach  of  Investigating  Individual  differences  with 
elementary  cognitive  tasks  offered  considerable  promise  not  only  In  supplementing  conventional 
tests  but  also  as  a  means  for  assessing  the  effects  of  physiological  changes  and  of  aging.  Of 
particular  Interest  to  Carroll  was  the  possibility  that  the  absolute  measurement  afforded  by  the 
analysis  of  cognitive  tasks,  as  contrasted  with  relative  measurement  given  by  conventional 
correlational  methods,  might  ultimately  result  In  a  Systeme  Internationale  of  experimental 
psychology. 


Individual  Differences  on  Cognitive  Tasks 

It  Is  Instructive,  then,  to  consider  just  how  promising  Is  the  approach  of  supplementing 
traditional  ability  measures  with  scores  from  elementary  cognitive  tasks.  The  first  Issue 
concerns  whether  there  are  reliable  Individual  differences  In  various  scores  that  can  be  confuted 
from  such  tasks.  If  the  scores  or  parameters  are  not  reliable.  It  does  not  mean  that  such  scores 
are  Imprecise  (Rogosa  i  Willett,  1983),  but  It  does  mean  that  there  are  no  individual  differences 
to  speak  of  In  the  task  or  parameter  of  Interest.  Thus  the  establishment  of  reliability  can  be 
viewed  as  a  central  Issue  In  the  determination  of  whether  a  particular  score  Is  a  good  measure  of 
Individual  differences. 

In  the  Rose  and  Fernandez  (1977)  study,  54  college  students  were  administered  a  battery  of 
nine  cognitive  tasks,  each  presented  on  a  computer.  Between  36  and  38  subjects  (depending  on  the 
task)  were  readministered  the  tasks  on  a  second  day,  thereby  allowing  the  computation  of 
test-retest  reliabilities.  Tasks  were  selected  to  represent  the  domains  of  memory, 
psycholinguistics,  and  visual  Information  processing.  Further,  tasks  selected  (a)  had  a  history 
of  published  support,  (b)  had  an  adequate  theoretical  rationale,  (c)  were  adaptable  to 
paper-and-pencll  or  computerized  adnfnistratlon,  and  (d)  Indicated  that  reliable  individual 
differences  were  present  on  the  task.  Table  2  presents  descriptive  statistics  on  various  scores 
computed  from  the  tasks. 
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Table  2.  Descriptive  Statistics  from  Rose-Fernandez  Study 
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3 
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Ml 

M2 

SD1 

SD2 

T  * 

rxx 

Posner  and  Mitchell's  Letter  Classification  Task  (Response  Latencies) 

Physical  Match  (PI) 

585 

547 

64 

57 

.57 

Naae  Natch  (MI) 

684 

629 

100 

71 

.58 

Category  Natch  (Cl) 

849 

771 

173 

121 

.78 

Different 

761 

693 

104 

83 

.81 

NI-PI 

99 

81 

62 

45 

.29 

CI-NI 

164 

137 

131 

102 

.69 

Meyer's  Lexical  Decision  Task  (Response  Latencies) 

Word  Recognition 

736 

647 

112 

74 

.66 

NonWord  Recognition 

916 

756 

252 

113 

.53 

Encoding  Facilitation 

975 

958 

78 

75 

.42 

Baron's  Graphemlc/Phonemlc  Analysis  Task  (Response  Latencies) 

Sense«Nonsense 

1205 

1193 

246 

197 

.83 

Homophone-sense 

1289 

1187 

300 

241 

.90 

Homophone-non sense 

1579 

1423 

306 

235 

.47 

SH/HN 

.81 

.83 

.09 

.09 

.37 

Sternberg's  Digit  Scanning 

Task  (Response  Latencies) 

(Slope,  Positive) 

75 

49 

32 

21 

.60 

(Intercept,  Positive) 

442 

425 

68 

78 

.52 

(Slope  Negative) 

46 

47 

28 

15 

.45 

(Intercept,  Negative) 

536 

464 

96 

59 

.51 

Joula's  Word  Scanning  Task 

(Response  Latencies) 

(Slope  Positive) 

56 

52 

32 

24 

.19 

(Intercept,  Positive) 

483 

446 

102 

89 

.46 

(Slope,  Negative) 

47 

53 

32 

31 

.00 

(Intercept,  Negative) 

544 

446 

145 

67 

.40 

Joula's  Category  Scanning  Task  (Response  Latencies) 

(Slope,  Positive) 

122 

93 

85 

65 

.31 

(Intercept,  Positive) 

611 

637 

245 

216 

.68 

(Slope,  Negative) 

214 

140 

96 

56 

.32 

(Intercept,  Negative) 

575 

595 

238 

176 

.36 

Clark's  Sentence-Picture  Verification  Task  (Response  Latencies) 

■below" 

136 

110 

155 

149 

-.06 

■negate" 

829 

685 

354 

319 

.81 

■comparison* 

200 

146 

183 

200 

.28 

•encode* 

1735 

1489 

404 

330 

.59 

rtf 


Table  2  (Concluded) 


Ml 

M2 

SD1  SD2 

rxx 

Collins  and  Qullllan's  Fact  Verification  Task  (Response  Latencies) 

Slope,  superset  relation 

63 

42 

57  77 

.21 

Intercept,  superset  relation 

1035 

1017 

205  220 

.69 

Slope,  property  relation 

67 

53 

89  75 

.16 

Intercept,  property  relation 

1118 

1121 

257  248 

.73 

Shepard  A  Teghtsoonlan's  Recognition 

Memory  Task  (Probability  Correct) 

Proportion  Correct 

.73 

.73 

.07  .07 

.56 

exponent  on  decay  function  (lag) 

-.07 

-.10 

.19  .08 

.31 

Intercept  on  decay  function  (lag) 

.86 

.93 

.15  .12 

.21 

Probability  (hit) 

.73 

.77 

.11  .12 

.56 

Probability  (false  alarm) 

.28 

.31 

.12  .12 

.67 

d'  (sensitivity) 

1.28 

1.34 

.41  .45 

.62 

Beta  (bias) 

1.09 

.93 

.62  .50 

.38 

Motel .  Response  latencies  are  In  milliseconds;  Ml:  Mean  on  day  1, 
M2:  Mean  on  day  2,  SOI:  Standard  deviation  over  subjects  on  day  1, 
S02:  Standard  deviation  on  day  2,  Rxx':  day  1,  day  2  correlation. 


The  various  parameters  for  each  task  are  described  In  detail  by  Rose  and  Fernandez  (1977). 
It  may  nevertheless  be  useful  to  consider  the  top  listed  task,  Posner  and  Mitchell's  (1967) 
Letter  Classification  Task,  In  more  detail,  both  to  provide  a  sense  for  the  scores  that  are 

derived  from  a  single  task,  and  because  this  task  In  particular  has  received  considerable 
attention  In  the  Individual-differences  literature.  In  this  task  the  subject  Is  shown  two 
letters  side  by  side.  The  task  Is  to  determine  whether  the  two  letters  are  (a)  physically 
Identical  (e.g.,  A-A),  (b)  Identical  In  name  (e.g.,  A-a),  or  (c)  are  the  same  In  terms  of 

vowel -consonant  category  (e.g.,  A-E),  depending  on  Individual  task  Instructions.  If  the  letters 
did  match  according  to  task  Instructions,  the  subject  was  to  respond  by  pressing  one  of  two  keys 
on  a  panel,  but  If  the  letters  were  not  the  same,  the  subject  was  Instructed  to  press  the  other 
key.  In  addition  to  the  mean  response  times  for  each  of  these  three  tasks  (for  “same"  trials), 

three  other  scores  could  be  computed.  A  "Difference"  score  was  computed  from  response  times  on 

trials  for  which  the  letters  did  not  match.  A  name-identity  (NX)  minus  physical -Identity  (PI) 
response  time  score  was  computed  to  reflect  the  speed  with  which  an  Individual  can  access 
Information  from  long-term  memory.  The  rationale  behind  this  computation  Is  that  to  make  the  PI 
Judgment,  an  Individual  can  respond  on  the  basis  of  the  displayed  physical  Information,  but  to 
make  an  MI  judgnent,  the  Individual  must  retrieve  the  name  of  both  letters  from  long-term  memory 
before  these  abstract  name  codes  can  be  compared  to  one  another. 

Table  2  presents  means  and  standard  deviations,  In  milliseconds,  for  the  day  1  and  day  2 
statistics  from  the  Rose  and  Fernandez  study,  In  the  columns  marked  Ml,  M2,  SD1 ,  and  SD2.  Also, 
the  test-retest  correlations  are  displayed  as  reliability  Indices  In  the  column  marked  rxx‘. 
Note  that  In  all  but  a  very  few  cases,  there  was  a  considerable  decrement  In  response  latency 
over  days,  and  also  a  corresponding  decrement  In  variability.  The  test-retest  reliability  data 
show  that  In  many  cases  the  ordering  of  Individuals  changed  substantially  from  one  day  to  the 
next.  This  can  be  Interpreted  In  one  of  two  ways.  The  conventional  wisdom  Is  that  this 
Indicates  such  scores  are  unstable  and  therefore  not  good  candidates  for  a  performance  test 
battery.  Alternatively,  If  both  day  1  and  day  2  Internal  reliabilities  are  high,  but  the 
test-retest  reliability  Is  low.  It  can  mean  that  some  Individuals  are  benefiting  from  practice 
and  others  are  not,  which  Itself  could  be  an  Important  Individual  difference  variable. 
Unfortunately,  Rose  and  Fernandez  did  not  provide  Internal  consistency  data. 


In  any  event,  caution  should  be  applied  before  taking  any  of  Rose  and  Fernandez' 

reliabilities  too  seriously,  because  they  are  based  on  an  extremely  small  sample.  Yet  the 
pattern  of  reliabilities  may  still  be  Informative,  and  one  pattern  result  apparent  from  Table  2 
Is  that  derived  scores  are  generally  less  reliable  than  scores  that  represent  the  duration  of 
performing  a  complete  task,  which  Is  consistent  with  many  other  studies  In  the  literature.  For 
example,  while  both  the  NI  and  PI  match  scores  are  highly  reliable,  the  NI-PI  difference  score  Is 
not.  Carter  and  Krause  (1983)  reported  data  on  some  of  the  tasks  In  Table  2,  along  with  some 
others,  and  found  that,  In  all  cases,  slope  scores  (a  kind  of  difference  score)  were  less 
reliable  than  were  the  mean  response  times  from  which  the  slopes  were  computed.  From  this 
result,  they  argued  that  slope  scores  should  not  be  used  as  performance  measures,  but  rather  the 
mean  response  times  by  themselves  are  sufficient  for  answering  most  questions  the  applied 

researcher  might  be  Interested  In  asking.  However,  the  Carter  and  Krause  argument  Is  at  odds 
with  the  stated  philosophy  of  Rose  and  Fernandez,  who  argued  that  total  scores  are  often  less 
meaningful  than  scores  derived  from  total  scores  Insofar  as  they  reflect  combinations  of 
psychological  processes  rather  than  a  single  process,  such  as  "memory  comparison."  The  meaning 
of  this  discrepancy  Is  discussed  In  the  following  paragraphs. 

The  topic  of  difference  scores  has  been  a  highly  controversial  one  In  the  psychological 

literature  for  at  least  the  last  25  years,  but  a  recent  analysis  of  the  topic  by  Rogosa  and  his 

colleagues  (Rogosa,  Brandt,  8  Zlmowskl,  1982;  Rogosa  8  Willett,  1983)  Is  clarifying.  Rogosa  et 
al.  argued  that  considering  the  statistical,  as  well  as  psychometric,  properties  of  change 
measures  leads  to  an  evaluation  of  the  reliability  of  the  difference  score  that  Is  at  odds  with 
widely  accepted  notions.  In  particular,  Rogosa  and  colleagues  showed  that  the  difference  score 
Is  unreliable  when  Individual  differences  In  change  do  not  exist,  but  that  It  can  be  highly 
reliable  If  In  fact  Individual  differences  In  change  do  exist.  In  many  of  the  tasks  Inspected  by 
Carter  and  Krause  and  others,  there  Is  a  high  correlation  between  two  of  the  scores  from  which 
the  slope  Is  computed.  Also,  In  many  of  the  reported  studies,  there  Is  an  extremely  high 
correlation  between  response  time  on  the  MI  and  PI  match  tasks.  What  these  high  correlations 
Indicate  Is  that  there  Is  very  little  In  the  way  of  Individual  differences  In  the  change  between 
performance  on  the  two  tasks.  Thus,  It  Is  not  merely  a  statistical  artifact  that  produces  low 
change  score  reliabilities.  Rather,  the  low  reliabilities  (or  conversely,  high  correlations 
between  task  1  and  task  2)  are  Interesting  empirical  results  that  establish  the  lack  of 
Individual  differences  In  the  change  variable. 

It  should  be  pointed  out  that  Carroll  (19%))  has  presented  analyses  of  data  on  many  studies 
that  have  appeared  In  the  literature  and  are  similar  to  the  Rose  and  Fernandez  study  (1977). 
Unfortunately,  the  vast  majority  of  those  studies,  like  the  Rose  and  Fernandez  study,  also 
suffered  from  a  small  sample  size.  More  recently,  the  Maval  Biodynamics  Laboratory  has  supported 
a  nmnber  of  studies  that  have  Investigated  the  psychometric  properties  of  a  large  number  of 
cognitive  tasks  (Carter  8  Krause,  1983;  Kennedy,  Bittner,  Carter,  Krause,  Harbeson,  McCafferty, 
Pepper,  8  Wlker,  1981;  Kennedy,  Bittner,  Harbeson,  8  Jones,  1981).  These  studies  too  have 
suffered  from  small  sample  sizes. 


Do  Cognitive  Tasks  Measure  Unique  Abilities? 

If  It  turns  out  that  there  are  reliable  Individual  differences  on  many  elementary  cognitive 
tasks,  then  the  question  of  whether  these  scores  represent  previously  unidentified  abilities  can 
be  addressed.  A  recently  completed  study  at  AFHRL  approached  this  question  by  comparing 
cognitive  task  scores  to  aptitude  test  scores.  Falrbank,  Tlrre,  and  Anderson  (1984)  administered 
30  different  cognitive  tasks,  divided  Into  six  task  batteries,  to  six  Independent  samples  of  Air 
Force  enlistees.  Tasks  were  selected  from  a  taxonomy  to  reflect  verbal,  spatial,  and 
quantitative  processing  and  to  yield  meaningful  error  and  latency  scores.  That  Is,  not  all  the 
tasks  were  designed  solely  to  measure  some  form  of  processing  speed.  For  each  task,  standard 
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mathematical  models  of  errors  and  solution  latency  were  fit  to  the  data,  then  certain  parameters 
of  these  models  were  extracted  as  Indicators  of  various  aspects  of  processing  efficiency.  The 
rationale  behind  the  different  models  will  not  be  discussed  here,  but  the  rationale  was  similar 
to  that  given  by  Rose  and  Fernandez.  However,  Falrbank  et  al.  also  reported  some  total  task 
score  data  (l.e.,  mean  over  all  Items).  With  reference  to  the  previous  discussion,  such  scores 
are  useful  to  the  extent  that  Individual  differences  on  scores  derived  from  total  scores  do  not 
exist. 

One  key  result  from  the  data  published  by  Falrbank  et  al.  (1984)  data  was  that  reliabilities 
tended  to  be  high,  Indicating  stable  Individual  differences  on  most  of  these  tasks.  When 
considering  average  scores,  reliabilities  for  all  the  elementary  reaction  time  tasks  exceeded 
.90.  On  the  other  hand,  derived  score  reliabilities  presented  a  mixed  picture.  Some  of  the 
derived  scores,  such  as  the  NI-PI  difference,  had  low  reliability  (less  than  .50),  while  others, 
such  as  the  slope  from  memory  scanning  tasks  (Sternberg,  1969),  had  high  reliabilities  (greater 
than  .80).  Pellegrino  (1984)  has  also  found  high  reliabilities  among  memory  scanning  tasks. 
Falrbank  et  al.  also  computed  correlations  between  the  Information  processing  scores  and 
standardized  test  score  composites  taken  from  the  Armed  Services  Vocational  Aptitude  Battery 
(Department  of  Defense,  1984).  The  General  (G),  Administrative  (A),  Electronics  (E),  and 
Mechanical  (M)  composites  are  those  used  by  the  Air  Force  for  personnel  classification  purposes. 
Thus,  such  correlations  reflect  the  degree  to  which  cognitive  task  data  overlap  the  conventional 
test  score  data,  that  are  used  operationally  In  the  Air  Force. 

The  pattern  of  correlations  with  ASVAB  composites  showed  that  despite  the  fact  that  the 
cognitive  task  scores  were  fairly  reliable,  correlations  with  conventional  measures  tended  to  be 
fairly  low:  No  correlation  exceeded  0.50.  This  Indicates  that  conventional  tests  do  not  tap,  at 
least  In  a  factorial  pure  sense,  the  processing  skills  tapped  by  the  cognitive  tasks.  Thus, 
although  cognitive  tasks  are  not  likely  to  replace  conventional  tests,  there  might  very  well  be 
room  to  supplement  them.  Elsewhere,  Kyllonen  (1985b)  reported  data  consistent  with  the  Falrbank 
et  al.  finding  on  this  matter.  In  a  factor-analytic  Investigation  of  17  cognitive  tasks  that 
tapped  verbal,  quantitative,  and  reasoning  abilities,  along  with  ASVAB  tests,  It  was  found  that 
of  eight  cognitive  factors  Identified  by  the  conventional  and  cognitive  tests,  three  (reasoning 
speed,  verbal  speed,  and  quantitative  level)  were  not  measured  by  the  ASVAB  tests. 

There  are  a  number  of  other  Interesting  trends  In  the  Falrbank  et  al.  (1984)  data.  First, 
while  most  of  the  response  time  measures  correlated  highest  with  the  ^  composite  (which  bears 
considerable  resemblance  to  the  Clerical  or  Perceptual  Speed  Factor  In  the  differential 
literature),  the  percent  correct  measures  computed  on  the  more  complex  reasoning  and  memory  tasks 
correlated  higher  with  the  J5  composite  (the  general  ability  measure).  This  suggests  that  error 
and  latency  data  may  reveal  different  aspects  of  performance,  a  finding  consistent  with  the 
Kyllonen  (1985b)  results.  It  also  was  found  that  the  Intercept  In  the  scanning  tasks  correlated 
highest  with  the  jA  composite,  while  the  slope  showed  no  strong  or  consistent  differential 
correlation  pattern.  The  Intercept  Is  presumed  to  reflect  time  for  encoding  and  response,  and 
the  slope  Is  presumed  to  reflect  the  time  it  takes  to  perform  a  single  memory  comparison  step. 
Also,  Falrbank  et  al.  found  that  with  very  few  exceptions,  scores  from  cognitive  tasks  correlated 
higher  with  either  the  _A  or  £  composite  than  they  did  with  the  M.  or  £  composite,  presumably 
because  these  latter  two  scores  reflect  primarily  the  extent  of  specialized  knowledge  bases  In 
either  the  mechanical  or  electronics  domains. 

Two  other  results  are  worthy  of  mention.  One  Is  that  mean  and  standard  deviation  were  highly 
correlated  on  almost  all  of  the  response  time  measures,  reflecting  Increasing  variability  being 
associated  with  slower  responding.  Finally,  on  two  linguistic  transformation  tasks.  Sentence 
Verification  and  Three-Term  Series.  Falrbank  et  al .  computed  percent  correct  scores  separately 
for  the  first  and  final  blocks  of  Items.  In  both  cases,  the  correlation  between  percent  correct 
and  JS  was  significantly  higher  on  the  first  than  on  the  final  block,  perhaps  reflecting  the  fact 


that  general  cognitive  demands  are  greater  early  on  In  a  test.  This  is  consistent  with  Fleishman 
and  Hempel's  (19S4)  classic  demonstration  of  changing  cognitive  demands  on  a  psychomotor  task  as 
a  function  of  practice. 

At  the  time  of  this  writing,  Falrbank  et  al.  are  currently  engaged  In  further  analysis  of 
their  extensive  data  set  to  explore  the  degree  to  which  parameters  measured  on  one  set  of  tasks 
correspond  to  parameters  on  various  other  sets.  The  purpose  of  the  analysis  Is  to  explore  the 
generality  of  the  various  processing  parameters.  Although  similar  kinds  of  analyses  have  been 
conducted  by  others  (Carroll,  1980;  Rose  i  Fernandez,  1977),  as  was  mentioned  previously,  most  of 
these  have  suffered  from  lack  of  power  due  to  Inadequate  sample  size.  With  cognitive  task  data 
collected  on  over  2500  Air  Force  trainees.  It  should  prove  possible  to  examine  various  hypotheses 
beyond  the  reach  of  others  who  have  attempted  to  do  so  with  smaller  subject  pools. 


Analysis  of  Changes  In  Processing  Efficiency 

The  final  major  Issue  to  be  addressed  In  conjunction  with  the  general  consideration  of 
assessment  methodology  concerns  the  degree  to  which  subjects  Improve  with  practice.  Rose  and 
Fernandez  (1977)  showed  large  Improvements  In  the  second  day,  and  these  findings  are  consistent 
with  many  published  reports  of  performance  on  cognitive  tasks  that  have  not  examined  Individual 
differences.  A  fairly  well-established  finding  across  many  diverse  cognitive  tasks  Is  that 
response  time  decreases  In  accordance  with  a  power  law  (Lewis,  1980;  Newell  S  Rosenbloom,  1981), 
as  RT(N)  ■  aNb,  where  RT(N)  Is  response  time  for  an  Item  at  trial  _N,  ±  Is  response  time  for  the 
Item  on  trial  1,  and  b^  Is  the  parameter  governing  the  rate  of  change.  Recently,  Individual 
differences  researchers  have  begun  to  examine  the  question  of  whether  rate  of  Improvement  Is  an 
Important  source  of  Individual  difference  variation  that  deserves  special  consideration. 

In  one  noteworthy  study,  conducted  as  part  of  the  Project  LAMP,  Pellegrino  (1984) 
Investigated  Individual  differences  In  changes  In  Information  processing  efficiency  on  fairly 
simple  cognitive  tasks  of  the  type  reviewed  in  previous  sections.  In  his  study,  60  young  adults 
were  administered  three  cognitive  tasks  In  four  to  eight  successive  sessions.  Tasks  were 
designed  to  sample  visual,  verbal,  and  quantitative  abilities.  The  visual  processing  task  was  a 
perceptual  matching  task  In  which  subjects  were  required  to  compare  matrices  of  varying  size  with 
one  another  for  physical  Identity.  The  semantic  processing  task  was  a  version  of  the  matching 
task  In  which  subjects  were  Instructed  tv  match  by  letter  category,  but  letters  could  be  the  same 
(or  different)  physically  or  by  name  or  category.  The  quantitative  task  presented  elementary 
number  facts  Involving  either  addition,  subtraction,  multiplication,  or  division,  and  the  subject 
Indicated  whether  the  given  answer  was  true.  Analyses  centered  on  (a)  determining  the  adequacy 
of  various  mathematical  models  proposed  which  posited  elementary  operations  such  as  encoding, 
comparison,  decision,  and  response,  (b)  determining  the  relationships  between  Information 
processing  parameters  and  standard  aptitude  test  scores,  and  (c)  determining  the  relationships 
between  standard  test  scores  and  slope  and  intercept  parameters  derived  from  a  power  law  analysis 
of  practice  effects. 

The  Initial  analyses  Indicated  that  the  models  fit  the  data  quite  well,  and  parameters  from 
the  models  were  generally  reliable.  Also,  Pellegrino  found  that  Individuals  performed  all  the 
tasks  faster  as  a  function  of  practice,  In  accordance  with  the  power  law  characterization  of  rate 
of  change.  The  processing  parameter-aptitude  correlations  were  generally  low,  except  In  the  case 
of  the  perceptual  speed  factor  which  was  modestly  related  to  the  Intercept  parameter  In  some  of 
the  models  (where  the  Intercept  reflected  time  for  encoding  and  responding),  consistent  with  the 
Falrbank  et  al.  (1984)  results.  The  analysis  of  starting  point  (Intercept)  and  rate  of  change 
(slope)  parameters  showed  that  starting  point  In  many  cases  was  significantly  related  to 
perceptual  speed,  and  also  was  related  across  tasks  (median  r_  ■  0.48).  However,  change  rate  was 
unrelated  to  any  of  the  aptitude  measures.  Further,  change  rates  across  tasks  were  unrelated  to 
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each  other  (median  r_  •  0.08).  This  result  suggests  that  there  Is  not  a  general  learning  ability, 
at  least  Insofar  as  this  particular  type  of  change  Is  regarded  as  a  form  of  learning.  Rather, 
learning  of  this  type  seems  to  be  task  specific. 

Recently,  Ackerman  and  Schneider  (1984)  reviewed  a  number  of  studies  that  showed  data 
consistent  with  Pellegrino's  results.  The  authors  observed  that  correlations  between  Initial 
performance  and  final  performance  on  a  wide  variety  of  psychomotor  and  cognitive  tasks  have 
typically  been  found  to  be  quite  low,  which  is  to  say  that  individual  differences  early  In 
training  do  not  map  very  neatly  onto  Individual  differences  later  In  training.  Considering  that 
the  purpose  of  ability  tests  when  used  as  aptitude  measures  Is  to  predict  individual  differences 
at  the  end  of  training,  this  Is  a  lamentable  finding.  It  also  clashes  with  the  view  that 
intelligence  Is  related  to  the  ability  to  learn. 

With  these  observations  as  an  Impetus,  Ackerman  and  Schneider  attempted  to  provide  a 
comprehensive  account  of  Individual  differences  In  changes  In  Information  processing  efficiency 
by  synthesizing  current  Ideas  on  the  structure  of  human  abilities  with  considerations  of 
automatl  c/control  led  processing  theory  (Shlffrln  &  Schneider,  1977).  The  ability  model  they 
adopted  Is  of  the  hierarchical  variety,  reviewed  in  a  previous  section  (along  the  lines  of 
Gustaffson's,  1984,  HILI  model).  The  processing  model  assumed  that  there  are  two  distinct  forms 
of  processing:  controlled  and  automatic.  Automatic  processing  Is  fast,  can  be  done 
simultaneously  (l.e..  In  parallel)  with  other  processing  activities,  and  does  not  draw 
attentlonal  resources.  Controlled  processing  is  slow,  is  performed  In  serial,  and  draws  heavily 
on  attention.  It  has  been  shown  that  processing  can  become  more  automatic  with  extensive 
practice,  so  long  as  processing  requirements  remain  consistent  from  trial  to  trial.  However,  if 
requirements  vary,  processing  will  remain  In  a  controlled  state  (Fisk  8  Schneider,  1983). 

Ackerman  and  Schneider  proposed  a  mapping  between  the  concept  of  general  attentlonal 
resources  (Kahneman,  1973)  and  general  intellectual  ability,  from  which  can  be  derived  the 
prediction  that  general  ability  and  broad-domain  abilities  should  be  related  to  success  on  tasks 
requiring  substantial  amounts  of  controlled  processing  (where  controlled  processing,  by 
definition,  requires  a  heavy  Investment  of  general  or  broad-domain  attentlonal  resources). 
Conversely,  the  authors  also  proposed  a  mapping  of  automatic  processing  and  lower-order,  highly 
specific  abilities,  which  leads  to  the  prediction  that  general  ability  will  not  be  related  to 
success  (e.g.,  speed)  during  automatic  processing,  but  some  low-order  factor  might  be.  Ackerman 
and  Schneider  (see  also  Ackerman,  1984)  reported  data  that  lent  some  support  to  their  proposal. 
They  showed  that  on  a  task  that  prevented  the  development  of  process  automatlclty  (by  varying  the 
processing  requirements  over  trials),  general  ability  and  verbal  ability  (l.e.,  a  broad-domain 
ability)  were  highly  related  to  response  time.  The  relationship  between  these  broad  abilities 
and  performance  on  a  task  that  enabled  the  development  of  process  automatlclty  (by  maintaining 
consistent  processing  requirements)  was  lower.  It  also  was  found  that  the  relationship  between 
perceptual /motor  ability  and  task  performance  did  not  differ  between  the  two  tasks. 

The  Pellegrino  and  the  Ackerman  and  Schneider  studies  represent  Important  Initial  steps  In 
the  Investigation  of  Individual  differences  In  a  particular  kind  of  learning,  which  Rumelhart  and 
Norman  (1978)  have  called  "tuning."  These  researchers  have  gone  beyond  considering  Individual 
differences  In  Initial  performance  In  moving  toward  a  characterization  of  differences  In 
performance  changes  as  a  function  of  practice.  Given  that  such  changes  are  so  extremely 
coomnnplace  on  cognitive  tasks,  and  also  that  the  purpose  of  using  cognitive  tasks  for  selection 
and  classification  Is  to  predict  the  endpoint  of  often  extensive  training,  it  Is  likely  that  this 
topic  and  the  general  approaches  employed  by  these  researchers  will  become  Important  cornerstones 
In  future  Individual  differences  research.  Applications  may  be  particularly  appropriate  In 
endeavors  that  apply  such  methods  to  the  prediction  of  success  In  high-speed  decision-making 
activities  such  as  air  traffic  controller  and  aircraft  pilot.  Although  Ackerman  and  Schneider’s 
call  for  a  change  In  assessment  procedures  based  on  automatic/controlled  processing  theory  may  be 


16 


■f.-r. 


v  is.-'. -t.  -v-' 

-iW  -‘.VV  -S  -V.  .  -  .  -  .  -  .  . 


a  bit  premature  (and  Impractical  In  that  as  much  as  20  hours  of  testing  may  be  required),  a 
proposal  to  continue  research  along  these  lines  Is  not  premature. 


V.  ANALYSIS  OF  LEARNING 

As  Ackerman  and  Schneider  (1984)  rightfully  point  out,  the  purpose  of  ability  testing  is 
often  to  predict  who  will  do  best  after  extensive  training.  That  being  the  case.  It  would  seem 
appropriate  to  direct  ability  testing  toward  the  analysis  of  learning  ability.  That  Is,  It  Is 
important  that  ability  assessment  devices  yield  Information  about  who  learns  best  in  a  particular 
situation,  and  therefore,  who  will  end  up  as  the  best  performer  after  training.  But  just  what  Is 
learning  ability?  Is  It  a  unitary  Individual  differences  construct,  or  are  there  multiple 
varieties  of  learning  ability? 

Rumelhart  and  Norman  (1978)  have  distinguished  three  types  of  learning,  which  they  call 
accretion,  restructuring,  and  tuning.  Accretion  refers  to  the  accumulation  of  facts; 
restructuring,  to  the  development  of  new  cognitive  procedures;  and  tuning,  to  the  process  of 
making  existing  cognitive  procedures  more  efficient.  Viewed  along  these  lines,  the  previous 
section  reviewed  studies  on  Individual  differences  In  tuning  processes.  In  the  present  section, 
the  focus  Is  on  Individual  differences  In  the  other  forms  of  learning,  particularly  accretion. 

Why  Is  the  examination  of  accretion  processes  Important?  In  the  conventional  testing 
setting,  the  assessment  procedures  yield  data  that  reflect  the  current  skill  level  of  an 
examinee.  These  data  certainly  must  reflect  to  some  degree  the  amount  of  prior  exposure  an 
Individual  has  had  to  problems  of  the  type  administered.  Yet  It  Is  possible  that  individuals 
reliably  differ  In  their  learning  rate  and  thus  relations  between  Initial  level  and  final  level 
relations  might  be  attenuated  as  a  result.  This  attenuation  would  manifest  Itself  in  validity 
coefficients  lower  than  might  be  obtained  If  learning  rate  were  considered  along  with  initial 
level.  With  this  possibility  as  a  motivation,  a  number  of  studies  conducted  In  the  last  few 
years  have  been  concerned  with  determining  whether  learning  rate  Is  likely  to  be  an  Important 
variable  for  consideration  In  future  assessment  batteries.  In  the  past.  It  has  been  difficult  to 
conduct  studies  of  this  nature  because  of  the  difficulty  of  exercising  sufficient  control  over 
stimulus  presentation  and  response  fee&ack,  both  of  which  may  be  Important  In  Investigating 

dynamic  learning  processes,  but  the  current  widespread  availability  of  microcomputers  alleviates 
this  problem. 

Two  recent  studies  by  Allen  and  Morgan  and  colleagues  were  concerned  with  the  relationship 
between  Initial  level  and  learning  rate  and  with  the  relationship  between  learning  rate  and 

conventional  ability  measures.  In  the  first  study  (Allen,  Secundo,  Salas,  S  Morgan,  1983),  70 
college  students  were  administered  three  learning  tasks  (a  fourth  task  was  administered  but  data 

were  not  analyzed  due  to  a  computer  malfunction).  In  a  Coded  Messages  Task,  subjects  studied  12 

word-symbol  pairs,  then  made  a  series  of  same-different  judgments  on  the  equivalence  of  sentences 
and  symbol  strings.  For  example,  an  examinee  might  be  questioned  about  whether  the  sentence 
“Enemy  aircraft  approaching  from  the  North"  was  equivalent  to  the  string  "*  I  @  -."  In  an 

Emergency  Procedures  Task,  subjects  studied  a  set  of  procedures  on  how  to  handle  emergencies  and 
then  were  tested  for  their  knowledge  on  the  serial  order  of  procedures.  In  a  Security  Checking 
Task,  subjects  studied  a  map  of  landnarks  on  a  hypothetical  Air  Force  base,  where  each  landnark 
had  an  associated  security  level  (high,  medium,  or  low).  In  the  test,  subjects  were  asked 

questions  such  as  "What  Is  the  security  level  of  the  second  low  security  location  after  the 
tower?"  Allen  et  al .  computed  three  scores  for  each  subject.  An  Initial  learning  level  score 
was  the  sum  of  correct  responses  during  the  first  third  (17  minutes)  of  the  task,  a  final  level 

score  was  the  sum  of  correct  responses  during  the  final  third  of  the  task,  and  a  rate  score  was 

the  difference  between  the  two  level  scores.  The  Important  findings  were  that  first,  Initial 
learning  level,  by  Itself,  did  not  predict  final  level  accurately;  the  rate  variable 


significantly  added  to  the  prediction.  Second,  Initial  learning  level  was  related  across  the 
different  learning  tasks,  but  rate  and  final  level  scores  were  not.  Recall  that  Pellegrino 
(1984)  found  much  the  same  result  in  his  simpler  classification  and  matching  tasks. 

In  a  follow-on  study  (Allen,  Salas,  Pitts,  Terranova,  S  Morgan,  1984),  these  investigators 
readministered  the  same  four  learning  tasks  and  three  additional  learning  tasks  (to  separate 
groups  of  63  and  60  students,  respectively)  along  with  a  battery  of  conventional  aptitude  tests. 
The  purpose  of  this  study  was  to  determine  whether  final  levels  of  performance  could  be  predicted 
solely  by  conventional  test  scores  or  whether  scores  from  learning  tasks  would  account  for 
additional  variance.  They  found  that  factor  scores  derived  from  an  analysis  of  conventional 
ability  tests  were  significantly  related  to  final  performance  on  four  of  the  seven  learning  tasks 
(with  £?  ranging  from  0.29  to  0.38)  but  that,  on  all  but  one  task,  goodness  of  prediction  of 
final  level  was  significantly  enhanced  by  the  inclusion  of  learning  rate  variables  In  the 
prediction  equation  (Increment  In  _r2  ranging  from  0.07  to  0.24).  This  result  suggests  that 
learning  rate  may  be  a  reliable  and  somewhat  general Izable  individual  difference  variable  that  Is 
not  currently  measured  by  conventional  tests.  A  more  compelling  demonstration  of  the  utility  of 
learning  rate  measures,  however,  would  show  that  such  measures  predict  final  performance  levels 
In  a  long-term  learning  environment  such  as  that  found  in  standard  2-month  technical  training 
courses  In  Industry  or  the  military. 

Thus,  In  a  third  study  In  the  series,  Allen,  Pitts,  Jones,  and  Morgan  (1985)  explored  the 
utility  of  their  learning  rate  measures  In  predicting  final  performance  levels  (course  grade)  in 
technical  college  courses  (computer  science,  _N  =  90;  bacteriology,  =  66;  and  engineering  _N  = 
48).  The  major  hypothesis  tested  was  that  learning  task  parameters  (slope  and  Intercept)  would 
add  to  the  utility  of  the  standard  measures  (high  school  grade  point  average  and  Scholastic 
Aptitude  Test  scores)  In  predicting  final  course  grade.  The  Intercept  of  the  learning  function 
In  the  Allen  et  al.  study  reflected  the  amount  learned  during  the  pre-performance  instructional 
phase  of  the  task;  the  slope  reflected  the  average  amount  of  performance  Improvement  during  each 
minute  of  the  task.  Thus,  both  parameters  were  In  some  way  reflective  of  learning  rate. 

Analyses  showed  that  the  learning  rate  measures  were  significant  predictors  of  final  grade  In 
all  three  courses  when  considered  separately  and  that.  In  some  cases,  the  rate  measures  accounted 
for  additional  variance  In  final  grade  beyond  that  accounted  for  by  either  high  school  GPA  or  SAT 
scores.  However,  the  rate  measures  did  not  contribute  to  the  predictive  efficiency  of  the 
equation  that  Included  both  GPA  and  SAT  scores.  Setting  aside  the  issue  of  statistical  power, 
this  result  could  be  Interpreted  as  Indicating  that  learning  rate  Is  already  reflected  to  some 
degree  In  either  or  both  the  GPA  and  SAT  scores. 

Nevertheless,  the  series  of  studies  taken  together  demonstrates  that  even  a  fairly  rough 
approach  to  the  analysis  of  learning  may  be  profitable  In  providing  assessment  measures  with 
practical  utility.  Yet,  there  Is  a  need  to  consider  at  a  somewhat  more  basic  level,  what  it  is 
that  contributes  to  differences  between  people  In  variables  such  as  learning  rate.  The 
theoretical  framework  outlined  earlier  In  this  chapter,  for  example,  could  be  read  to  imply  that 
differences  In  more  fundamental  source  variables  were  responsible  for  differences  in  general 
acquisition  proficiency.  If  so,  then  findings  such  as  some  of  those  emerging  from  the  studies 
just  reviewed  need  themselves  to  be  explained  in  more  detail.  Recall  the  earlier  argument  that 
the  reason  such  detailed  analysis  Is  sought,  apart  from  general  scientific  pursuits.  Is  to 
provide  potentially  greater  adaptability  and  flexibility  In  a  system  of  assessment. 

In  a  recent  stu  y,  Kyllonen,  Tlrre,  and  Chrlstal  (1984)  addressed  directly  the  question  of 
how  processing  speed  and  efficiency,  along  with  factual  and  procedural  knowledge,  can  play  a 
central  role  In  learning.  The  logic  employed  ran  as  follows.  In  typical  paired  associates 
learning,  the  likelihood  that  a  person  would  recall  the  response  term,  given  the  stimulus  cue, 
depends  on  the  density  of  the  memory  structure  that  was  created  at  the  time  of  study.  That  is. 


If  a  person  can  create  a  highly  elaborated,  interactive  structure  that  connects  the  stimulus  and 
response  terras  during  study,  then  It  Is  more  likely  that  the  pair  will  be  successfully  retrieved 
(or  recognized)  at  test.  Alternatively,  If  the  learner  falls  to  create  any  structure  that  links 
Items  together  at  study  time,  then  probability  of  retrieval  Is  reduced.  This  is  essentially  the 
finding  of  the  utility  of  mnemonic  devices. 

A  richer  long-term  memory  structure  allows  greater  opportunity  for  retrieval  because  It 
allows  more  entry  points  to  access  that  portion  of  the  structure  that  Is  relevant  to  the  memory 
task.  The  hypothesis  was  that  subjects  with  high  verbal  knowledge  (l.e.,  those  who  score  well  on 
standard  vocabulary  tests)  come  to  the  experimental  session  with  an  already  wel 1 -devel oped 
declarative  memory  structure  for  words  and  associated  concepts.  If  given  plenty  of  time  for 
study,  these  subjects  will  have  a  distinct  advantage  over  their  low  verbal  knowledge  counterparts 
In  their  ability  to  Integrate  stimulus  and  response  terms  and  thus  successfully  retrieve  pairs  at 
test.  The  reason  for  this  Is  that  the  concepts  activated  during  study  should  be  more  highly 
Integrated  and  thus  serve  as  good  cues,  or  entry  points,  at  test.  The  paucity  of  structure 
characterizing  low  verbal  subjects  leads  to  the  activation  of  fewer  and  perhaps  less  distinctive 
concepts,  thus  leading  to  more  retrieval  difficulties  at  test. 

If  study  time  Is  short,  on  the  other  hand,  the  advantage  for  high  verbal  subjects  might  not 
be  as  great.  Although  by  the  logic  above,  there  should  still  be  some  advantage,  the  most 
critical  variable  In  establishing  a  connected  structure  to  facilitate  recall  should  be  the  rate 
at  which  relevant  concepts  can  be  retrieved.  In  sum,  the  prediction  was  that,  with  liberal  study 
time,  the  high  verbal  knowledge  individual  should  have  a  distinct  advantage;  with  limited  study 
time,  the  fast  verbal  processor  should  have  the  distinct  advantage. 

To  test  these  hypotheses,  a  series  of  cognitive  tasks  was  presented  to  Air  Force  enlistees; 
these  tasks  were  designed  to  assess  the  enlistees'  breadth  of  verbal  knowledge  and  the  speed  with 
which  they  were  able  to  access  verbal  concepts.  A  paired  associates  learning  task  was  also 
adalnlstered  to  subjects.  In  the  task,  pairs  were  presented  for  study  at  one  of  five  rates 
ranging  from  .5  to  8  seconds. 

Two  hypotheses  were  tested  which  related  to  these  notions  about  learning  ability.  First,  an 
Individual's  likelihood  of  correctly  responding  on  Items  In  which  pairs  were  presented  at  the 
slow  rate  (8  s)  would  be  more  highly  related  to  the  breadth  of  an  Individual's  verbal  knowledge 
than  would  be  the  likelihood  of  correctly  responding  on  Items  presented  at  the  fast  rate  (.5  s). 
In  both  cases,  some  relationship  was  taken  for  granted  but  was  expected  to  be  greater  at  the  8  s 
condition.  The  second  hypothesis  was  that  the  reverse  relationship  should  hold  when  the  variable 
of  Interest  was  verbal  processing  speed.  That  is,  probability  correct  at  the  .5  s  level  should 
be  more  highly  related  to  processing  speed  than  probability  correct  at  the  8  s  rate.  Indeed 
there  was  no  compelling  reason  to  believe  that  processing  speed  should  have  any  effect  on 
probability  correct  at  the  8  s  rate. 

Although  the  analysis  Is  fairly  complex.  It  was  found  that  under  certain  conditions  the 
expected  relationships  did  hold.  The  relationship  between  learning  success  and  word  knowledge 
was  higher  In  the  8  s  condition  than  In  the  .5  s  condition.  And  the  relationship  between 
learning  success  and  verbal  processing  speed  was  higher  In  the  high  speed  (.5  s)  study  condition 
than  In  the  leisurely  8  s  condition.  Because  a  number  of  measures  of  processing  speed  were 
Included,  each  designed  to  tap  a  different  configuration  of  psychological  processes,  It  was 
possible  to  Isolate  the  source  of  processing  speed  differences  operating  In  the  paired  associates 
task.  The  analysis  showed  that  simple  motor  speed  differences,  or  even  simple  comparison  speed 
differences,  could  be  ruled  out.  The  critical  speed  seemed  to  have  been  how  quickly  an 
individual  was  able  to  search  memory  to  retrieve  a  relevant  concept. 


One  of  the  Implications  to  be  drawn  from  this  study  Is  related  to  the  use  to  which  verbal  (or 
more  generally,  semantic)  processing  speed  tasks  might  be  put  In  application  efforts.  Much 
attention  has  been  given  In  recent  years  to  the  letter  classification  task  as  a  useful  Index  of 
verbal  ability  (Hunt,  1978).  This  results  from  the  modest  but  apparently  reliable  relationship 
found  between  response  time  measures  on  the  task  and  composite  verbal  aptitude  measures.  The 
original  Interest  In  the  finding  was  related  entirely  to  the  theoretical  Issue  of  understanding  a 
cognitive  component  that  might  have  a  causal  linkage  to  the  development  of  verbal  knowledge.  But 
there  might  also  have  been  the  Implicit  belief  that  the  task,  along  with  other  similar  tasks, 
might  somehow  serve  as  replacement  for  knowledge -depen dent  verbal  aptitude  tests.  The  Kyllonen 
et  al.  study  suggests  a  more  appropriate  use  for  verbal  processing  speed  tests.  That  Is,  under 
conditions  of  high  Information  flow,  such  as  experienced  In  the  cockpit  or  a  variety  of  similar 
situations,  such  a  measure  of  processing  efficiency  might  predict  who  will  remember  Information 
most  effectively  as  It  flows  through  In  real  time.  (See  Chrlstal,  Tlrre,  and  Kyllonen  (1984)  for 
further  discussion  along  these  lines.) 

Although  the  studies  reviewed  In  this  section  present  provocative  findings,  the  limitations 
of  this  research  should  be  emphasized.  In  both  the  Allen  et  al.  and  the  Kyllonen  et  al .  studies, 
the  criterion  tasks  were  fairly  simple  fact  acquisition  tasks.  Understanding  the  relationships 
between  parameters  on  such  tasks,  and  understanding  the  cognitive  determinants  of  performance  on 
such  tasks.  Is  only  a  useful  first  step  to  understanding  what  It  Is  that  causes  some  to  learn 
faster  than  others.  What  Is  needed  to  make  true  progress  In  understanding  learning  ability  Is 
the  analysis  of  how  skills  develop  In  the  context  of  more  realistic  and  long-term  learning 
environments.  The  typical  validity  studies  that  Identify  correlates  of  final  GPA  after  weeks  of 
technical  training  are  not  the  solution  In  that  they  present  a  whole  class  of  new  problems, 
mostly  related  to  the  lack  of  control  over  the  conditions  of  learning,  and  the  failure  In 
yielding  dynamic  learning  progress  Indices. 

A  new  development  that  seems  promising  at  the  present  Is  the  analysis  of  learning  In  the 
computerized  Intelligent  tutoring  system  environment  where  a  great  deal  of  Information  about  what 
progress  a  student  Is  making  could  be  computed  (In  principle),  and  full  control  over  the 
conditions  of  learning  Is  possible.  One  study  conducted  along  these  lines  compared  a  variety  of 
aptitude  and  motivation  test  scores  to  a  whole  host  of  dynamic  learning  variables  (Snow, 
Wescourt,  4  Collins,  1983).  The  study,  although  only  a  pilot  (In  that  only  a  small  number  of 
students  were  available  for  analysis),  showed  the  great  potential  for  Integrating  cognitive 
assessment  methods  with  computerized  Instruction  In  an  effort  to  discover  the  underlying  sources 
of  the  ability  to  learn.  It  Is  likely  that  future  efforts  of  this  type  will  be  forthcoming. 


VI.  ANALYSIS  OF  COGNITIVE  SKILL 

One  of  the  more  exciting  recent  developments  In  cognitive  assessment  comes  from  the  analysis 
of  complex  cognitive  skills  such  as  reading,  troubleshooting  (of  electronic  systems),  typing, 
mathematics  problem  solving,  and  computer  programing.  Much  of  this  research  employs  many  of  the 
same  techniques  and  methods  employed  by  researchers  Investigating  learning  and  more  elementary 
cognitive  skills,  but  the  goals  of  the  research  tend  to  center  around  the  Issue  of  the  nature  of 
basic  skills  for  potential  training  applications  rather  than  selection  and  classification 
applications.  The  general  Idea  Is  that  If  the  underlying  cognitive  constituents  of  complex 
behavior  are  understood,  more  effective  diagnoses  of  learning  and  performance  disabilities  might 
be  possible  and  thereby  result  In  prescribing  more  effective  remedial  training. 

One  particularly  nice  application  of  this  strategy  can  be  found  In  a  systematic  program  of 
research  studies  conducted  by  Frederlckson  and  colleagues  In  the  area  of  general  reading  skills 
(Frederick sen.  Weaver,  Warren,  Glllotte,  Rosebery,  Freeman,  4  Goodman,  1983).  Initial  work  on 
this  project  concerned  the  Identification  of  the  components  of  reading  (Frederlcksen,  1981;  1982) 


through  the  use  of  differential  assessment  techniques  of  the  type  discussed  throughout  this 

chapter.  The  research  strategy  was  to  administer  cognitive  tasks  such  as  letter  matching,  word 
recognition,  and  anagram  encoding,  and  to  test  various  models  that  accounted  for  the  pattern  of 
relationships  among  measures  derived  from  the  tasks.  This  analysis  then  resulted  In  an 

Identification  of  component  skills,  Individual  estimates  of  which  In  turn  were  correlated  with 
scores  on  standard  reading  tests  to  determine  the  skills  that  differentiated  good  from  poor 

readers.  Frederlcksen  et  al.  (1983)  then  selected  three  of  the  components  as  particularly 

critical  for  reading,  and  developed  specific  computerized  remedial  training  of  the  components. 
The  training  proved  successful  on  a  series  of  reading  tasks,  but  even  more  Interesting  was  the 
fact  that  Frederlcksen  et  al.  administered  a  variety  of  cognitive  criterion  measures  to  Identify 
which  training  had  an  effect  on  which  particular  set  of  skills. 

The  Frederlcksen  study  Is  only  one  of  a  number  of  similar  (If  not  as  wide-scoped  and 
systematic)  studies  that  have  compared  experts  In  a  particular  subject-matter  domain  with  novices 
In  an  attempt  to  define  the  underlying  component  skills  of  expertise.  Thus  far,  research  has 
tended  toward  the  analysis  of  more  academic  expertise,  such  as  physics  (Chi,  Glaser,  A  Rees, 
1981;  Larkin,  McDermott,  Simon,  I  Simon,  1980),  but  It  Is  not  out  of  line  to  Imagine  the  analysis 
of  more  prosaic  but  nevertheless  critical  areas  of  technical  training. 

Recently,  the  Air  Force  has  become  convinced  that  new  cognitive  methods  of  analysis  hold  the 
key  to  a  redefinition  of  what  It  Is  that  constitutes  a  "basic  skill."  Gott  and  Davis  (1983)  have 
related  this  reconceptuallzatlon  to  a  switch  from  a  focus  on  what  they  call  a  power-based 
strategy  for  assessing  general  facility  to  a  knowledge-based  approach  that  recognizes  the  narrow, 
domain-specific  nature  of  cognitive  skill.  Traditionally,  both  within  and  outside  the  military 
system,  basic  skills  have  been  defined  as  the  three  Rs  of  reading,  writing,  and  arithmetic. 
However,  there  Is  a  growing  realization  that  skills  defined  at  this  level  of  generality  do  not 
lead  easily  to  prescriptions  for  how  remediation  for  skills  deficiencies  can  be  accomplished. 
There  Is  the  hope  that  a  more  fundamental  domain-specific  characterization  of  skill  might  more 
naturally  suggest  techniques  for  overcoming  particular  deficiencies. 

In  the  first  large  scale  effort  of  this  kind,  two  occupational  specialties  In  the  Air  Force 
(Jet  Engine  Mechanics  and  Avionics  Troubleshooting)  have  been  examined  (Bond,  Eastman,  Gltomer, 
Glaser,  A  Lesgold,  1983).  In  an  extensive  dissertation,  Gltomer  (1984)  has  documented  a  number 
of  studies  concerned  with  Identifying  the  basic  skills  Involved  In  troubleshooting  electronic 
aircraft  equipment.  The  study  was  motivated  by  the  observation  that  there  are  tremendous 
differences  between  first-term  airmen  In  their  ability  to  perform  troubleshooting  effectively, 
despite  the  fact  that  the  airmen  considered  had  all  completed  extensive  technical  training. 
Based  on  ratings  by  supervisors,  Gltomer  divided  16  airmen  Into  two  groups  (of  _N  *  7  high-skilled 
and  -  9  low-skilled)  and  proceeded  to  administer  a  series  of  tasks  groups  to  Isolate  the  source 
of  the  skill  difference.  Tasks  ranged  In  complexity  from  a  complex  troubleshooting  simulation  to 
simple  picture-name  classification.  Some  were  variants  on  standard  cognitive  tasks  discussed 
elsewhere  In  this  paper,  such  as  one  that  required  examinees  to  Identify  the  name  of  a  pictured 
component,  which  bears  a  resemblance  to  the  name  Identity  task.  Others,  such  as  a  series  of 
component  clustering  tasks,  and  some  open  Interview  tasks  (e.g.,  "tell  me  all  you  know  about 
azimuth  hydraulic  actuators")  have  been  used  In  connection  with  studies  on  physics  expertise 
(Chi,  Feltovlch,  A  Glaser,  1981).  Still  others  were  fairly  domain-specific  cognitive  tasks  that 
resulted  from  a  careful  task  analysis  of  troubleshooting  activities.  An  example  Is  the  Logic 
Gate  Computation  task  In  which  an  examinee  Is  required  to  fill  In  a  blank  given  a  partially 
complete  logic  gate  truth  table  (e.g.,  given  the  relationship,  "NAND,"  and  the  Input  values, 
high,  and  low,  what  Is  the  output  value?). 

From  the  pattern  of  differences  found  over  the  13  tasks  actainlstered,  Gltomer  was  able  to 
paint  a  picture  of  the  constitution  of  troubleshooting  skill.  He  found  that  the  more  skilled 
performers  differed  from  the  less  skilled  In  that  they  were  driven  by  better  specified  goals  more 


consistent  with  task  demands,  they  had  more  methods  available  to  them  for  attacking  a  problem, 
they  were  able  to  execute  such  methods  more  efficiently,  and  they  were  better  able  to  select 
appropriate  problem  solving  methods  across  different  situations.  Some  of  the  areas  in  which 
differences  were  not  found  may  be  as  revealing  as  areas  of  difference.  Gltomer  found  no  effect 
for  time  in  training,  and  he  found  (surprisingly)  no  differences  between  the  two  groups  on  their 
Electronics  Aptitude  score.  He  also  found  that  both  groups  had  poor  knowledge  of  general 
electronics  principles,  presumably  because  after  the  first  few  weeks  of  formal  training,  such 
general  principles  played  no  part  in  job  task  activities. 

One  of  the  Important  Issues  related  to  studies  along  the  lines  of  this  one  Is  the  question  of 
how  general  are  basic  skills.  Although  the  generality  Issue  cannot  be  addressed  systematically 
on  the  basis  of  a  single  study,  there  is  some  evidence  from  Gitomer's  work  that  not  all  the 
differences  were  narrow  domain-specific  differences.  The  logic  gate  task,  although  directly  a 
part  of  avionics  troubleshooting  activities,  is  actually  quite  general  in  that  it  plays  a  role  in 
a  broad  variety  of  complex  cognitive  activities  such  as  logical  analysis  and  computer 
programing.  And  the  task  revealed  substantial  differences  between  high  and  low  skilled 
performers.  Although  Gltomer  did  not  completely  spell  out  the  reasons  why  the  difference  might 
have  shown  up  so  clearly,  It  Is  possible  that  the  cause  may  have  been  related  to  differences  In 
general  working  memory  capacity,  one  of  the  sources  Identified  In  the  Interactive  common  sources 
framework  (see  Figure  1). 

The  Important  point  to  draw  from  this  work,  which  Is  really  only  In  Its  preliminary  stages. 
Is  that  through  the  administration  of  carefully  constructed  cognitive  tasks  It  should  prove 
possible  to  Isolate  the  sources  of  differences  among  people  who  perform  complex  activities  with 
differing  degrees  of  proficiency.  The  results  of  such  analyses  should  be  prescriptive  statements 
about  how  the  less  skilled  Individuals  might  be  tutored  to  overcome  specific  deficits.  In  this 
regard,  Gltomer  has  shown  how  the  lack  of  differences  In  many  of  the  tasks  he  administered  Is 
actually  a  quite  encouraging  sign  in  that  It  demonstrates  that  the  poor  troubleshooters  are  not 
simply  worse  at  all  cognitive  skills,  but  rather  they  suffer  particular  and  Isolable 
deficiencies.  Powerful  prescriptions  for  training  are  much  more  likely  to  result  from 
considerations  for  these  specific  deficiencies  than  from  global  recoimendatlons  to  train  people 
to  "read  better." 


VII.  SUMMARY 

The  purpose  of  cognitive  assessment  Is  to  apply  current  understanding  of  how  people  think, 
learn,  and  remember  toward  the  measurement  of  an  Individual's  proficiency  level  In  these 
activities.  The  most  obvious  way  In  which  a  new  technology  for  cognitive  assessment  might  have 
an  impact  Is  In  Improving  present  selection  and  classification  systems.  In  this  paper,  some  of 
the  most  recent  attempts  to  explore  issues  related  to  the  feasibility  of  new  measurement  methods 
were  reviewed.  It  appears  that  Individual  differences  on  elementary  cognitive  tasks  are 
generally  substantial,  and  there  Is  evidence  that  such  differences  are  not  being  captured  by 
conventional  ability  tests.  This  suggests  a  role  for  cognitive  tasks  as  supplements  to 
conventional  tests,  and  It  may  be  that  they  will  be  particularly  valid  performance  predictors  for 
specialized  occupations  that  require  particular  kinds  of  psychological  processing.  A  second 
possible  way  In  which  current  selection  and  classification  systems  might  be  supplemented  Is  with 
measures  that  directly  assess  changes  In  processing  efficiency  as  a  function  of  practice. 
Because  the  form  of  such  changes  determines  an  examinee's  expected  performance  level  at  the  end 
of  training  or  practice.  It  Is  reasonable  to  expect  that  current  test  batteries,  which  assess  an 
examinee's  current  state  of  knowledge,  can  be  profitably  augmented  by  Including  measures  of 
changes  In  processing  efficiency. 


There  has  been  much  discussion  recently  about  what  possibilities  microcomputers  hold  for 
changing  the  way  assessment  Is  accomplished.  Much  of  this  discussion  centers  on  adaptive  testing 
technology  and  computerized  versions  of  existing  aptitude  tests  (Moreno,  Wetzel,  McBride,  S 
Weiss,  1983;  Weiss,  1983),  but  more  recently,  attention  has  Increasingly  turned  toward  the  Issue 
of  whether  new  abilities  can  now  be  measured  (Belmont,  1983;  Hunt,  1982).  In  a  thoughtful 
review.  Hunt  and  Pellegrino  (1984)  have  discussed  changes  that  computerized  testing  can  bring  In 
assessment  of  the  traditional  spatial,  verbal,  and  reasoning  abilities,  as  well  as  In  the 
previously  unmeasured  abilities  related  to  learning,  attention,  and  psychomotor  skills,  and  thus 
their  report  might  be  read  as  a  supplement  to  the  views  expressed  In  this  paper.  Hunt  (1982) 
concluded.  In  his  earlier  paper,  that  although  It  Is  not  yet  practically  feasible,  a  concerted 
5-year  research  program  that  aimed  to  consolidate  cognitive  measurement  techniques  and  explore 
their  utility  as  Intelligence  tests  might  have  Important  long-term  benefits  In  leading  to 
Increased  predlcltlve  validity  of  aptitude  batteries.  This  seems  to  sum  up  the  current  state  of 
the  art  In  ability  measurement:  The  new  cognitive  assessment  techniques  show  promise,  but 
considerable  extra  research  effort  will  be  required  before  such  tests  will  be  feasible  for 
personnel  decision-making  purposes  In  operational  settings.  And  further,  true  significant 
strides  In  ability  measurement  applications  await  further  developments  In  establishing  adequate 
criterion  measures.  One  particularly  promising  area  In  this  regard  Is  the  use  of  computerized 
Instructional  environments  to  serve  as  test  beds  In  validation  research. 

An  area  that  potentially  may  more  quickly  benefit  from  new  forms  of  cognitive  assessment  has 
to  do  with  the  Identification  of  basic  skills.  Two  studies  were  reviewed  here  that  demonstrated 
how  a  careful  task  analysis  followed  by  a  comparison  of  performers  at  various  skill  levels  can 
lead  to  the  Identification  of  component  skills.  Skills  Identified  In  such  a  fashion  tend  to  be 
less  general  than  the  traditional  skills  of  reading,  writing,  and  arithmetic,  and  by  virtue  of 
the  specificity  of  such  skills,  cognitive  diagnosis  Is  more  easily  accomplished  and  prescriptions 
for  training  specific  deficits  more  naturally  result.  Here  again,  the  general  approach  Is  only 
beginning  to  be  explored,  but  given  the  tremendous  cost  of  training,  the  benefits  of  such  an 
approach  are  likely  to  be  realized  In  the  near  future,  and  diverse  application  efforts  are  likely 
to  be  seen. 

In  sum,  cognitive  assessment  of  the  type  that  has  been  the  main  focus  of  this  paper  Is  a 
promising  technology,  but  one  that  Is  not  yet  ready  to  be  applied  In  the  workplace.  As  the  cost 
of  already  fairly  Inexpensive  microcomputers  comes  down  even  further,  while  at  the  same  time 
applied  research  programs  provide  more  and  more  demonstrations  of  the  utility  of  new  and  diverse 
forms  of  cognitive  assessment,  some  of  the  best  Ideas  In  this  field  are  likely  to  be  transferred 
to  practical  applications.  Such  a  move  will  significantly  expand  the  Ideas  on  where,  how,  and 
for  what  purpose  people's  cognitive  capabilities  should  be  measured. 
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